Links to the main figures and tables referenced in the main paper:
Please note that the figure and table numbers used here do not match those used in the paper or its accompanying Supplementary Information.
This document (irrespective of its format, most probably
HTML or PDF) resulted from compiling the
corresponding Rmarkdown script and contains all the results
and plots supporting the paper (referred to as the Analysis
reports in the main paper). This Rmarkdown script
also generates the (panels) of the supplementary figures and the content
of the supplementary tables contained in the paper’s
Supplementary Information. The primary data are
available upon request from the corresponding author (Manxiang Wu manxiangwu1022@163.com), as instructed in the paper,
but all the code needed to reproduce this document are available in the
GitHub repository https://github.com/ddediu/tone_ax_dong.
Please note that this Rmarkdown script caches some very
expensive computations in the ./cached_results directory,
but it can be manually forced to recompute everything (setting the
variable FORCE_COMPUTE_ALL to TRUE) and it
also forces a full recomputation if any of the input files (in the
directory ./input_files) was changed, deleted or added (or
if this is the first time this is run). However, it is highly
recommended to not run such a full recomputation during
the knitting of the Rmarkdown, but instead in a “normal”
R session using, for example, the “Run ▾” → “Run all” menu
in RStudio (knitting the whole thing seems to generate
crashes due to memory issues). Also, this full recomputation should be
done on a powerful machine (with at least 32Gb RAM and a 4 cores CPU)
running Linux or macOS (to fully use multicore
parallelism; moreover, this script was not tested on
Windows) and it may take a while (it takes about 3 hours on
an AMD Ryzen 7 3700X with 64Gb RAM), but the subsequent knitting or
small changes can be run on a “normal” machine as the expensive results
are cached for later use.
This document uses the following font and color conventions:
fixed font text;R functions
and expressions, is shown using fixed font text in clearly
marked boxes;The full information about the version of R (R Core Team, 2023), the packages and the
hardware and software platform used to obtain this document are given in
the Section Session
information at the end of this document2.
We collected data from native speakers of a Southern dialect of Kam, described in detail in (Wu, 2018), probably Glottolog sout2741, which is characterized by a very complex tone system (see Wu, 2018, pp. 28–36) with 10 phonemic (i.e., contrastive) tones realized as 15 phonetic tones (and effected by various tone sandhi rules and language contact-induced ongoing changes). As expected, WALS assigns a “Complex tone system” to this variety (see “Chapter 13A” for Language Dong (Southern)); unfortunately, neither PHOIBLE not LAPSyD seem to contain any information about it.
In total, we collected usable data from 492 unique participants, from which we further excluded 2 participants who reported hearing problems (no participant reported brain or cognitive impairment), leaving a total of 490 participants in the sample.
Concerning self-declared gender, there are 331 (67.6%) self-declared females and 159 (32.4%) self-declared males:
Figure 1. Distribution of gender in the
sample. Figure generated using R version 4.3.3
(2024-02-29)
Figure 2. Distribution of age overall (thick
black curve) and by gender (colored transparent curves) in the
sample. Figure generated using R version 4.3.3
(2024-02-29)
It can be seen that the sample has 2.1 times more females than males (the χ2 test against the expected 50%:50% distribution is highly significant: χ2(1) = 60.4, p=7.84×10-15), but the ages are distributed in similar ways between the two genders, with a bi-modal distribution suggesting two age groups: one composed of adolescents and young adults (centered around 20 years of age and ranging between the minimum of 15 and about 30 years old) and the other (“adults”) centered around late 40s.
We used the location as a proxy for any relevant socio-linguistic dimensions of variation, and we ended up with participants from 5 locations:
| B | A | C | D | E |
|---|---|---|---|---|
| 237 | 227 | 1 | 1 | 1 |
The vast majority of participants comes from two neighboring locations A and B (about 8km apart), speaking very similar dialects of Kam (Manxiang Wu, pc); therefore we collapsed the remaining locations into an “other” category. Please note that for 23 (4.7%) participants this information is missing.
We also were able to retrieve some information concerning familial relationships for 91 (18.6%) participants, grouped in 28 nuclear families across a maximum of 3 generations: generation 0 comprises the youngest members, 1 is their parents’ generation, and 2 is that of their grandparent’s generation:
Figure 3. Distribution of the participants with
information about family by generation: 0=youngest (black), 1=their
parents (gray) and 2=their grandparents (light gray). The families are
identified with an arbitrary unique numerical ID. Figure generated using
R version 4.3.3
(2024-02-29)
It can be seen that there is only one participant in generation 2, so we collapsed generations 1 and 2, resulting into a binary split into a “young” and an “older” generation.
| young | older |
|---|---|
| 72 | 19 |
We considered three covariates here (as age has already been covered above, we focus on the remaining two).
Figure 4. Distribution of music_years overall
(thick black curve) and by gender (colored transparent curves)
in the sample. Figure generated using R version 4.3.3
(2024-02-29)
It is clear that in our sample it does not capture any interesting pattern of inter-individual variation, so we will ignore it in the following analyses.
Figure 5. Distribution of education_yearsby
gender in the sample. Figure generated using R version 4.3.3
(2024-02-29)
Figure 6. Distribution of education_years by
location in the sample. NA means that the location
information was not available. Figure generated using R version 4.3.3
(2024-02-29)
Overall, gender makes a significant difference (t(394.2)=-6.4, p=3.5×10-10; Mann-Whitney W=1.75565^{4}, p=1.75×10-9), with males having 2.35 years of formal education more than females on average (i.e., 7.64 for males vs 5.29 for females).
Focusing on the two locations that comprise the majority of the participants, location A has significantly higher educational levels that location B (t(461.9)=4, p=8.05×10-5; Mann-Whitney W=3.25335^{4}, p=8.24×10-5), with the participants from location A having 1.53 years of formal education more than the participants from B on average (i.e., 6.67 vs 5.14).
We performed the linear regression of education_years on age, gender and their interaction, and we found that this model behaves well (diagnostic plots not shown), and that it explains adjusted R2 = 48.7% of the variance, and that all terms are significant: age has a highly significant negative main effect (β = -0.264, p = 7.22×10-60), gender has a very large and highly significant main effect with the males having less years of education than females (β = -3.758, p = 8.23×10-5), but there is a highly significant interaction between the two (β = 0.138, p = 2.92×10-9) that offsets the main negative effect of gender into an advantage for males versus the females:
Figure 7. Predictive plot of the linear regression of
education_years on gender, age and their
interaction. Figure generated using R version 4.3.3
(2024-02-29)
pdf
2
pdf
2
pdf
2
pdf
2
pdf
2
The working memory task consists of 15 trials; in each trial a sequence of colors (each color appears only once in a trial) are shown to the participant and the participant has to reproduce the colors in the correct order, the score representing the number of colors reproduced in the correct position. For example, trial 1 shows “red”, “green” and “blue”, and if a participant reproduces “yellow”, “green”, “red”, her score is 1 (from “green”). The trials are the same across participants and vary from 3 to 7 colors, as show below:
| Trial | Length | Color 1 | Color 2 | Color 3 | Color 4 | Color 5 | Color 6 | Color 7 |
|---|---|---|---|---|---|---|---|---|
| 1 | 3 | red | green | blue | ||||
| 2 | 3 | black | purple | yellow | ||||
| 3 | 3 | gray | green | black | ||||
| 4 | 4 | red | blue | purple | gray | |||
| 5 | 4 | yellow | black | green | blue | |||
| 6 | 4 | green | red | black | yellow | |||
| 7 | 5 | blue | black | gray | yellow | red | ||
| 8 | 5 | gray | purple | yellow | green | blue | ||
| 9 | 5 | black | red | blue | gray | green | ||
| 10 | 6 | green | black | purple | blue | gray | yellow | |
| 11 | 6 | yellow | purple | black | red | green | gray | |
| 12 | 6 | gray | purple | blue | red | green | yellow | |
| 13 | 7 | red | green | blue | black | purple | yellow | gray |
| 14 | 7 | blue | gray | black | green | red | purple | yellow |
| 15 | 7 | yellow | red | blue | green | gray | black | purple |
Figure 8. Distribution of the working memory
scores across the trials by gender (showing the actual counts with
the females stacked on top of the males). Figure generated using R version 4.3.3
(2024-02-29)
gender makes a significant difference (t(4798.1)=-4.3, p=2.04×10-5; Mann-Whitney W=5.521666^{6}, p=1.88×10-6), with males scoring slightly higher across trials than females by 0.17 points on average (i.e., 2.29 for males vs 2.11 for females).
The three trials with the same length visually seem to have similar behaviors, but there are nevertheless significant differences between them:
| Length | Trials | ANOVA by trial | Signif. pairwise diffs. |
|---|---|---|---|
| 3 | 01, 02, 03 | F(2, 1467)=13.54, p=1.5×10-6 | trial 01 is easier |
| 4 | 04, 05, 06 | F(2, 1467)=0.25, p=0.782 | all trials are similar |
| 5 | 07, 08, 09 | F(2, 1467)=13.18, p=2.12×10-6 | trial 09 is easier |
| 6 | 10, 11, 12 | F(2, 1467)=3.76, p=0.023 | trial 12 is easier than trial 10 |
| 7 | 13, 14, 15 | F(2, 1467)=13.05, p=2.41×10-6 | trial 14 is harder |
We conducted both a Principal Component Analysis (PCA) and an Exploratory Factor Analysis (EFA) on all trials together, and we found that there seems to be a single factor.
For PCA, PC1 explains 40.2% of the variance, followed by PC2 which explains only 6.1%, suggesting that all trials load on a single latent variable:
Figure 9. Screeplot of the PCA of all the working
memory trials together. Figure generated using R version 4.3.3
(2024-02-29)
Figure 10. Loading of the working memory trials on the
first 2 PCs. Figure generated using R version 4.3.3
(2024-02-29)
Figure 11. The participants plotted on the first 2 PCs,
colored by their their qualities of representation (cos2). Figure
generated using R
version 4.3.3 (2024-02-29)
pdf
2
For EFA, all the preliminary tests suggest that factor analysis is appropriate (Kaiser-Meyer-Olkin = 0.95 > 0.60; Bartlett’s test is significant: χ2(105)=2297.1, p=0; and det(cor(data))=0.0086 > 0) and all the recommended methods for finding the appropriate number of factors suggest that 1 factor is enough:
Figure 12. Screeplot of the observed, simulated and
randomized data with 1 standard deviation error bars (as generated by
fa.parallel()). Figure generated using R version 4.3.3
(2024-02-29)
Figure 13. Number of factors as suggested by the VSS
criterion (top left), the complexity of the solution (top right), BIC
(bottom left) and Root Mean Residual (bottom right), as implemented by
nfactors(). Figure generated using R version 4.3.3
(2024-02-29)
Figure 14. Loadings of the variables in the 1 factor
model. Figure generated using R version 4.3.3
(2024-02-29)
Parallel analysis suggests that the number of factors = 1 and the number of components = NA
pdf
2
We also implemented a Confirmatory Factor Analysis
(CFA) with all trails loading on a single latent wm
variable, and we found that while the model formally does not fit the
data (χ2(133)=90.0, p=0.0022 ≤ 0.05), its
fit indices are ok (CFI=0.98, TLI=0.98, RFI=0.93, RMSEA=0.03) and the
path coefficients suggest that all trials load in similar ways on the
wm latent:
Figure 15. Confirnatory factor analysis (CFA) of the
working memory trislas with a single latent factor wm. Figure
generated using R
version 4.3.3 (2024-02-29)
Given all these, it makes sense to compute a total score from all the trials (i.e., using the same weight of 1.0); we further normalized it between its minimum possible score of 0.0 and its maximum of 3 × (3+4+5+6+7) = 75 (this variable will be denoted as wm_norm).
This variable is distributed as follows:
Figure 16. A: distribution of
normalized working memory score (wm_norm) overall
(thick black curve) and by gender (colored transparent curves)
in the sample. B: relationship between wm_norm
and age by gender with linear regression lines (and
95%CIs). C: Relationship between wm_norm and
education_years by gender with linear regression lines
(and 95%CIs). Figure generated using R version 4.3.3
(2024-02-29)
We performed the linear regression of wm_norm on age, the number of years of formal education (education_years), gender (the reference level being females) and all their interaction, and, following manual simplification, we found that this model behaves well (diagnostic plots not shown), that it explains adjusted R2 = 50.6% of the variance, and that age has a highly significant negative effect (β = -0.006, p = 5.62×10-17), education_years has a highly significant positive effect (β = 0.021, p = 3.95×10-22), and gender shows a significant difference between males and females, with males having overall smaller score than females (β = -0.032, p = 0.025).
The tone task is a AX task in which the participant is presented, in a given trial, with a pair of syllables that may differ only in tone, and has to decide if the two syllables are the “same” or “different”. For a given tone pair (let’s say “a” and “b”), there are two syllables with different segmental content (let’s say, “A” and “B”), resulting in the following four syllables+tone combinations: Aa, Ab, Ba and Bb. With these, we have the following possible trials:
Each trial was repeated twice and the repose to a given trial was scored as “correct” if the tones were different and the repose was “different”, or if the tones were the same and the response was “same”, and was scored as “incorrect” otherwise. Each participant was presented with a random (unique) order of the trials. Please note that some of the possible trials were not included in the task, as the participants showed a ceiling effects during the pilot study.
We used the 9 (of the 10) phonological tones in the language that occur in unchecked syllables (Wu, 2018), represented here by letters:
Likewise, in the name of brevity, we denote the segmental content of the syllables using CAPITAL letters, as follows:
The actual stimuli used are listed below:
Please note that stimuli sem35:sem335 (Ist), sɐm35:sɐm335 (Hst) are considered as “difficult” by the task designers in the sense that it is hard to hear the difference in tones even for these highly trained speakers of a tone language (Manxiang Wu, p.c.).
We begin this analysis based on the “6 steps”
approach of Dima (2018) and
the accompanying R code available at https://github.com/alexadima/6-steps-protocol.
Please note that we will a short notation for the items composed of the segment CAPITAL letter, followed by the one (for ‘same’) or two (for ‘different’) letter tone notation, and the presentation number (e.g., Its2 is the 2nd presentation of the ‘different’ item sem335:sem35).
Figure 17. Endorsement frequencies by item (items
ordered by % correct responses). Figure generated using R version 4.3.3
(2024-02-29)
Figure 18. Correlation matrix between items. Figure
generated using R
version 4.3.3 (2024-02-29)
Figure 19. Histogram of the tetracoric correlations
between different itmes. Figure generated using R version 4.3.3
(2024-02-29)
Figure 20. Hierarchical clustering of the items using 1
- tetrachoric correlations. Figure generated using R version 4.3.3
(2024-02-29)
Figure 21. Mean tetrachoric correlation with the other
items. Figure generated using R version 4.3.3
(2024-02-29)
It can be seen that the following items seem “weird”:
Interestingly, they seem to form meaningful groups:
the 4 “different” items involving segment B (“koi”) and tones h (33) and x (212) in both orders and both presentations; they also have very low % correct responses (around 22%) and also have correlations between them (tetrachoric rho’s between 0.28 and 0.39, with mean = 0.33 and sd = 0.04) → this suggests that tones h and x are very hard to distinguish when paired with segments B;
the 4 “different” items involving segment H (“sɐm”) and tones s (35) and t (335) in both orders and both presentations; they also have very low % correct responses (around 17%) and also have correlations between them (tetrachoric rho’s between 0.29 and 0.44, with mean = 0.38 and sd = 0.05) → this suggests that tones s and t are very hard to distinguish when paired with segments H;
the 4 “different” items involving segment I (“sem”) and tones s (35) and t (335) in both orders and both presentations; they also have very low % correct responses (around 17%) and also have correlations between them (tetrachoric rho’s between 0.21 and 0.36, with mean = 0.29 and sd = 0.05) → this suggests that tones s and t are very hard to distinguish when paired with segments I;
the 4 “different” items involving segment K (“səu”) and tones k (332) and l (52) in both orders and both presentations (please note that technically Kkl1 has a very small positive average correlation); they also have very low % correct responses (around 30%) and also have correlations between them (tetrachoric rho’s between 0.2 and 0.41, with mean = 0.29 and sd = 0.08) → this suggests that tones k and l are very hard to distinguish when paired with segments K;
the 4 “different” items involving segment L (“som”) and tones l (52) and p (452) in both orders and both presentations; they also have very low % correct responses (around 25%) and also have correlations between them (tetrachoric rho’s between 0.23 and 0.41, with mean = 0.3 and sd = 0.08) → this suggests that tones l and p are very hard to distinguish when paired with segments L;
Dhc1 (involving stimuli pju33 and pju23) basically has an average correlation of 0.0, while Dhc2 and Dch1 have a very small positive correlation, but Dch2 has a large positive correlation, suggesting that this may be a different case;
there are only 4 real words involved in these “special” items: sɐm35 for Hst1, sɐm35 for Hst2, sɐm35 for Hts1, sɐm35 for Hts2.
Moreover:
there is only one “same” item (Mc4, i.e, the 4th presentation of stimulus teu23) that has a low positive average correlation;
the other “different” stimuli involving B (“koi”) (i.e., Bsv1, Bsv2, Bvs1, Bvs2) have around 86% correct responses, and also have tetrachoric correlations between them between 0.41 and 0.59, with mean = 0.49 and sd = 0.06;
there are no other “different” stimuli involving H (“sɐm”);
the other “different” stimuli involving I (“sem”) (i.e., Icp1, Icp2, Iks1, Iks2, Ipc1, Ipc2, Ipv1, Ipv2, Ipx1, Ipx2, Isk1, Isk2, Ivp1, Ivp2, Ixp1, Ixp2) have around 82% correct responses, and also have tetrachoric correlations between them between 0.17 and 0.62, with mean = 0.42 and sd = 0.08;
there are no other “different” stimuli involving K (“səu”);
the other “different” stimuli involving L (“som”) (i.e., Lkl1, Lkl2, Llk1, Llk2, Lpv1, Lpv2, Lvp1, Lvp2) have around 68% correct responses, and also have tetrachoric correlations between them between 0.21 and 0.47, with mean = 0.36 and sd = 0.06;
there are no other “different” stimuli involving tones h (33) and x (212);
there are no other “different” stimuli involving tones s (335) and x (212);
the other “different” stimuli involving tones k (332) and l (52) (i.e., Lkl1, Lkl2, Llk1, Llk2) have around 64% correct responses, and also have tetrachoric correlations between them between 0.32 and 0.4, with mean = 0.36 and sd = 0.03;
there are no other “different” stimuli involving tones l (52) and p (452).
Taken together, these suggest that the four classes of “different” stimuli listed above (i.e., the permutations of Bhx, Hst, Ist, Kkl and Llp) tend to be massively misinterpreted by the participants (resulting in “incorrect” responses). However, we are missing here crucial items involving the H and K segments and other tone pairs, and the hx, sx and lp pairs of tones involving other segments, to be able to speculate if this is related to the particular combination of segments and tone pairs, or to the segments and/or the tone pairs themselves. It remains an interesting question of why these “different” items, representing 10 or 19.2%, behave differently from the other “different” and from virtually all the “same” items. Apparently, these “incorrect” responses persisted even when some of the participants were provided with explicit feedback by the experimenters (Manxiang Wu, p.c.) suggesting that these perceptions are “real” and not due to inattention or fatigue. Moreover, please note that Ist and Hst are specifically marked as “difficult” by the task creators, suggesting it is the tone pair st that is indeed hard to perceive as being different.
It is important to note that there is no tendency for the weird items to contain real words in the language (the table below shows percentages of items that have the corresponding properties):
| is.weird | FALSE | TRUE | ||
| is.same | is.word | |||
| FALSE | FALSE | 39.1 | 8.7 | |
| TRUE | 6.5 | 2.2 | ||
| TRUE | FALSE | 40.2 | 0 | |
| TRUE | 3.3 | 0 |
Fisher's Exact Test for Count Data
data: table(d_all_items[!d_all_items$is.same, c("is.word", "is.weird")])
p-value = 0.6415
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.1249375 10.6175357
sample estimates:
odds ratio
1.487323
While most “same” and all “different” items are presented twice, there are some “same” items that are presented more times: Dc (4), Ec (4), El (4), Ex (4), Ip (6), Is (4), Ll (4), Lp (4), Mc (4), Mh (4), Ml (4), with the maximum number of presentations being 6. So, the question is: “do later presentations differ from the earlier ones?”.
Figure 22. Successive presentations for the ‘same’
items: % correct and tetrachoric correlation between the current and the
previous presentation. Figure generated using R version 4.3.3
(2024-02-29)
Figure 23. Successive presentations for the ‘different’
items: % correct and tetrachoric correlation between the current and the
previous presentation. Figure generated using R version 4.3.3
(2024-02-29)
We fitted a beta regression model (using glmmTMB) of the
% correct responses on the interaction between presentation
number (between 1 and a maximum of 6, varying by item) and the
item type (“same” or “different”). Manual model simplification
shows that the interaction and the presentation number do not
matter (χ2(2)=0.4, p=0.839, ΔAIC=-3.6), but
item type does (χ2(3)=56.5,
p=3.21×10-12, ΔAIC=50.5), with the “same” items
having an overall higher % of correct responses than the “different”
items (Δ%correct = 21.0%, p=2.51×10-15). (There is
some overdispersion in the model, 1.32, p=0.016, but probably
not sufficient to qualitatively change these results.) The same lack of
a significant effect for presentation number is shown
separately for the ‘same’ items only (p=0.366) and for the
‘different’ items only (p=0.698).
Likewise, it seems the order of the tones for the ‘different’ items does not seem to matter (one-sample t-test against 0 of the differences between the two orders across the items is t(25)=1.5, p=0.156).
Thus, there seems to be no systematic effects of successive presentations of the same item, or of the order of the two tones for the ‘different’ items, on the percent of correct answers.
Figure 24. MSA: histogram of the number of Guttman
errors (gPlus) across all items. Figure generated using R version 4.3.3
(2024-02-29)
There were 73 cases (out of 492, i.e. 14.8%) with a number of Guttman errors bigger than (Q3 + 1.5*IQR) = 3184.75.
The complete item set has a homogeneity value H (se, 95%CI) of 0.201, (0.008), [0.186, 0.216]: this is significantly lower than the recommended 0.30, suggesting that the scale is not homogeneous. This is further supported by the fact that few items have a homogeneity around or above this value (only 2 if we consider the point estimate, and 59 out of 208 if we consider a 95%CI with an upper limit above 0.3). Interestingly, the homogeneity of related items (different presentations and different orders of the tones) are overall very similar, suggesting again that this an intrinsic property of the segments and tone(s) and not of their repeated presentation or of the order of tones.
These results suggest that there probably are more than 1 scales, and that some items behave very similarly, but before deciding if we continue we continue in this direction, let’s see what PCA/FA might suggest.
For PCA, PC1 explains 20.1% of the variance, followed by PC2 which explains 5.5%, suggesting there is a main factor on which most items load, but the story is bit more complex, with at least a 2nd factor needed (and a lot of variation remaining unexplained by these 2 components):
Figure 25. Screeplot of the PCA of all the tone items
together. Figure generated using R version 4.3.3
(2024-02-29)
Figure 26. Loading of the tone items on the first 2
PCs. Figure generated using R version 4.3.3
(2024-02-29)
Figure 27. The participants plotted on the first 2 PCs,
colored by their their qualities of representation (cos2). Figure
generated using R
version 4.3.3 (2024-02-29)
The actual loadings on the first two PCs are:
| PC1 | PC2 | |
|---|---|---|
| Ak1 | -0.07 | -0.05 |
| Ak2 | -0.08 | -0.06 |
| Akt1 | -0.07 | 0.09 |
| Akt2 | -0.07 | 0.09 |
| At1 | -0.08 | -0.03 |
| At2 | -0.08 | -0.04 |
| Atk1 | -0.06 | 0.10 |
| Atk2 | -0.06 | 0.09 |
| Bh1 | -0.08 | -0.03 |
| Bh2 | -0.07 | -0.04 |
| Bhx1 | 0.05 | 0.06 |
| Bhx2 | 0.04 | 0.06 |
| Bs1 | -0.09 | -0.04 |
| Bs2 | -0.09 | -0.05 |
| Bsv1 | -0.07 | 0.08 |
| Bsv2 | -0.07 | 0.10 |
| Bv1 | -0.08 | -0.05 |
| Bv2 | -0.08 | -0.04 |
| Bvs1 | -0.07 | 0.08 |
| Bvs2 | -0.08 | 0.08 |
| Bx1 | -0.07 | -0.03 |
| Bx2 | -0.10 | -0.04 |
| Bxh1 | 0.06 | 0.01 |
| Bxh2 | 0.04 | 0.04 |
| Ch1 | -0.08 | -0.07 |
| Ch2 | -0.10 | -0.05 |
| Chv1 | -0.04 | 0.09 |
| Chv2 | -0.05 | 0.09 |
| Cv1 | -0.07 | -0.06 |
| Cv2 | -0.07 | -0.04 |
| Cvh1 | -0.07 | 0.09 |
| Cvh2 | -0.07 | 0.07 |
| Dc1 | -0.07 | -0.02 |
| Dc2 | -0.07 | -0.07 |
| Dc3 | -0.09 | -0.07 |
| Dc4 | -0.08 | -0.05 |
| Dch1 | -0.01 | 0.10 |
| Dch2 | -0.03 | 0.09 |
| Dcx1 | -0.05 | 0.10 |
| Dcx2 | -0.05 | 0.10 |
| Dh1 | -0.07 | -0.04 |
| Dh2 | -0.06 | -0.07 |
| Dhc1 | 0.01 | 0.08 |
| Dhc2 | 0.00 | 0.10 |
| Dx1 | -0.07 | -0.07 |
| Dx2 | -0.08 | -0.06 |
| Dxc1 | -0.06 | 0.09 |
| Dxc2 | -0.05 | 0.10 |
| Ec1 | -0.08 | -0.06 |
| Ec2 | -0.08 | -0.05 |
| Ec3 | -0.07 | -0.06 |
| Ec4 | -0.08 | -0.08 |
| Ecl1 | -0.08 | 0.08 |
| Ecl2 | -0.08 | 0.09 |
| Ecx1 | -0.05 | 0.08 |
| Ecx2 | -0.05 | 0.10 |
| El1 | -0.07 | -0.06 |
| El2 | -0.08 | -0.07 |
| El3 | -0.07 | -0.06 |
| El4 | -0.08 | -0.08 |
| Elc1 | -0.09 | 0.08 |
| Elc2 | -0.07 | 0.10 |
| Elx1 | -0.09 | 0.05 |
| Elx2 | -0.09 | 0.07 |
| Ex1 | -0.08 | -0.02 |
| Ex2 | -0.06 | -0.04 |
| Ex3 | -0.09 | -0.09 |
| Ex4 | -0.08 | -0.06 |
| Exc1 | -0.04 | 0.09 |
| Exc2 | -0.04 | 0.13 |
| Exl1 | -0.09 | 0.07 |
| Exl2 | -0.10 | 0.06 |
| Fs1 | -0.09 | -0.05 |
| Fs2 | -0.09 | -0.03 |
| Fsv1 | -0.08 | 0.07 |
| Fsv2 | -0.08 | 0.09 |
| Fv1 | -0.09 | -0.03 |
| Fv2 | -0.08 | -0.01 |
| Fvs1 | -0.09 | 0.07 |
| Fvs2 | -0.07 | 0.09 |
| Gt1 | -0.07 | -0.04 |
| Gt2 | -0.08 | -0.05 |
| Gtx1 | -0.09 | 0.07 |
| Gtx2 | -0.08 | 0.09 |
| Gx1 | -0.06 | -0.03 |
| Gx2 | -0.08 | -0.04 |
| Gxt1 | -0.08 | 0.08 |
| Gxt2 | -0.08 | 0.09 |
| Hs1 | -0.07 | -0.04 |
| Hs2 | -0.07 | -0.06 |
| Hst1 | 0.05 | 0.06 |
| Hst2 | 0.05 | 0.07 |
| Ht1 | -0.08 | -0.05 |
| Ht2 | -0.08 | -0.04 |
| Hts1 | 0.05 | 0.05 |
| Hts2 | 0.05 | 0.06 |
| Ic1 | -0.07 | -0.05 |
| Ic2 | -0.09 | -0.05 |
| Icp1 | -0.06 | 0.08 |
| Icp2 | -0.07 | 0.07 |
| Ik1 | -0.07 | -0.04 |
| Ik2 | -0.08 | -0.05 |
| Iks1 | -0.06 | 0.09 |
| Iks2 | -0.07 | 0.09 |
| Ip1 | -0.05 | -0.02 |
| Ip2 | -0.08 | -0.05 |
| Ip3 | -0.05 | -0.05 |
| Ip4 | -0.08 | -0.05 |
| Ip5 | -0.08 | -0.06 |
| Ip6 | -0.08 | -0.03 |
| Ipc1 | -0.08 | 0.08 |
| Ipc2 | -0.08 | 0.08 |
| Ipv1 | -0.05 | 0.08 |
| Ipv2 | -0.05 | 0.10 |
| Ipx1 | -0.08 | 0.07 |
| Ipx2 | -0.08 | 0.07 |
| Is1 | -0.09 | -0.02 |
| Is2 | -0.08 | -0.01 |
| Is3 | -0.09 | -0.05 |
| Is4 | -0.09 | -0.04 |
| Isk1 | -0.07 | 0.05 |
| Isk2 | -0.07 | 0.09 |
| Ist1 | 0.05 | 0.05 |
| Ist2 | 0.05 | 0.06 |
| It1 | -0.08 | -0.05 |
| It2 | -0.07 | -0.03 |
| Its1 | 0.04 | 0.05 |
| Its2 | 0.07 | 0.07 |
| Iv1 | -0.09 | -0.05 |
| Iv2 | -0.08 | -0.06 |
| Ivp1 | -0.05 | 0.10 |
| Ivp2 | -0.05 | 0.09 |
| Ix1 | -0.07 | -0.06 |
| Ix2 | -0.08 | -0.02 |
| Ixp1 | -0.08 | 0.07 |
| Ixp2 | -0.07 | 0.09 |
| Jk1 | -0.07 | -0.02 |
| Jk2 | -0.08 | -0.04 |
| Jkx1 | -0.08 | 0.08 |
| Jkx2 | -0.08 | 0.08 |
| Jx1 | -0.08 | -0.05 |
| Jx2 | -0.07 | -0.02 |
| Jxk1 | -0.07 | 0.07 |
| Jxk2 | -0.07 | 0.09 |
| Kk1 | -0.08 | -0.04 |
| Kk2 | -0.08 | -0.08 |
| Kkl1 | -0.01 | 0.08 |
| Kkl2 | 0.01 | 0.07 |
| Kl1 | -0.08 | -0.06 |
| Kl2 | -0.09 | -0.05 |
| Klk1 | 0.03 | 0.09 |
| Klk2 | 0.03 | 0.08 |
| Lk1 | -0.08 | -0.03 |
| Lk2 | -0.09 | -0.06 |
| Lkl1 | -0.04 | 0.10 |
| Lkl2 | -0.03 | 0.11 |
| Ll1 | -0.09 | -0.02 |
| Ll2 | -0.07 | -0.06 |
| Ll3 | -0.08 | -0.07 |
| Ll4 | -0.07 | -0.06 |
| Llk1 | -0.02 | 0.09 |
| Llk2 | -0.02 | 0.10 |
| Llp1 | 0.04 | 0.08 |
| Llp2 | 0.02 | 0.08 |
| Lp1 | -0.07 | -0.02 |
| Lp2 | -0.08 | -0.05 |
| Lp3 | -0.08 | -0.06 |
| Lp4 | -0.09 | -0.05 |
| Lpl1 | 0.03 | 0.06 |
| Lpl2 | 0.03 | 0.06 |
| Lpv1 | -0.03 | 0.12 |
| Lpv2 | -0.04 | 0.12 |
| Lv1 | -0.07 | -0.07 |
| Lv2 | -0.08 | -0.06 |
| Lvp1 | -0.04 | 0.11 |
| Lvp2 | -0.04 | 0.13 |
| Mc1 | -0.03 | -0.05 |
| Mc2 | -0.03 | -0.08 |
| Mc3 | -0.04 | -0.07 |
| Mc4 | -0.01 | -0.07 |
| Mch1 | -0.02 | 0.08 |
| Mch2 | -0.04 | 0.08 |
| Mcl1 | -0.08 | 0.07 |
| Mcl2 | -0.09 | 0.08 |
| Mh1 | -0.06 | -0.06 |
| Mh2 | -0.07 | -0.05 |
| Mh3 | -0.07 | -0.05 |
| Mh4 | -0.07 | -0.05 |
| Mhc1 | -0.03 | 0.07 |
| Mhc2 | -0.03 | 0.09 |
| Mhl1 | -0.08 | 0.08 |
| Mhl2 | -0.09 | 0.10 |
| Ml1 | -0.08 | -0.06 |
| Ml2 | -0.07 | -0.07 |
| Ml3 | -0.07 | -0.06 |
| Ml4 | -0.05 | -0.05 |
| Mlc1 | -0.09 | 0.04 |
| Mlc2 | -0.08 | 0.10 |
| Mlh1 | -0.08 | 0.06 |
| Mlh2 | -0.09 | 0.07 |
| Nk1 | -0.08 | -0.04 |
| Nk2 | -0.08 | -0.03 |
| Nkp1 | -0.04 | 0.09 |
| Nkp2 | -0.04 | 0.10 |
| Np1 | -0.09 | -0.05 |
| Np2 | -0.07 | -0.04 |
| Npk1 | -0.03 | 0.10 |
| Npk2 | -0.03 | 0.09 |
It can be seen that:
For EFA, all the preliminary tests suggest that factor analysis is appropriate, with the possible exception of a determinant very close to 0.0 (Kaiser-Meyer-Olkin = 0.89 > 0.60; Bartlett’s test is significant: χ2(21528)=61167.2, p=0; and det(cor(data))=7.5e-64 > 0). However, when it comes to the best number of factors, the various methods diverge, but the overall story seems to be that 1 or 2 factors might be enough (but there seems to be a lot of variation beyond this as well, just as in the case of the PCA):
Figure 28. Screeplot of the observed, simulated and
randomized data with 1 standard deviation error bars (as generated by
fa.parallel()). Figure generated using R version 4.3.3
(2024-02-29)
Figure 29. Number of factors as suggested by the VSS
criterion (top left), the complexity of the solution (top right), BIC
(bottom left) and Root Mean Residual (bottom right), as implemented by
nfactors(). Figure generated using R version 4.3.3
(2024-02-29)
Figure 30. Loadings of the variables in the 2-factors
model. Figure generated using R version 4.3.3
(2024-02-29)
and the actual loadings on the two factors are (showing only those ≥ 0.1 in absolute value):
Loadings:
Factor1 Factor2
Ak1 0.461
Ak2 0.516
Akt1 0.499
Akt2 0.522
At1 0.482 0.126
At2 0.472
Atk1 0.512
Atk2 0.477
Bh1 0.450 0.139
Bh2 0.447
Bhx1 -0.376
Bhx2 -0.373
Bs1 0.535 0.128
Bs2 0.570
Bsv1 0.495
Bsv2 0.533
Bv1 0.493
Bv2 0.482
Bvs1 0.118 0.483
Bvs2 0.116 0.513
Bx1 0.416
Bx2 0.569 0.130
Bxh1 -0.309 -0.129
Bxh2 -0.291
Ch1 0.557
Ch2 0.585 0.109
Chv1 0.406
Chv2 0.426
Cv1 0.489
Cv2 0.459
Cvh1 0.479
Cvh2 0.127 0.420
Dc1 0.392 0.128
Dc2 0.530
Dc3 0.638
Dc4 0.523
Dch1 -0.216 0.338
Dch2 0.362
Dcx1 0.479
Dcx2 0.467
Dh1 0.442
Dh2 0.469
Dhc1 -0.246 0.230
Dhc2 -0.252 0.327
Dx1 0.531
Dx2 0.536
Dxc1 0.476
Dxc2 0.475
Ec1 0.507
Ec2 0.506
Ec3 0.497
Ec4 0.628
Ecl1 0.144 0.493
Ecl2 0.108 0.514
Ecx1 0.399
Ecx2 0.464
El1 0.469
El2 0.565
El3 0.506
El4 0.578
Elc1 0.173 0.528
Elc2 0.564
Elx1 0.275 0.437
Elx2 0.206 0.511
Ex1 0.406 0.139
Ex2 0.390
Ex3 0.647
Ex4 0.534
Exc1 0.420
Exc2 -0.167 0.513
Exl1 0.205 0.490
Exl2 0.264 0.495
Fs1 0.557
Fs2 0.516 0.153
Fsv1 0.164 0.461
Fsv2 0.114 0.543
Fv1 0.504 0.144
Fv2 0.369 0.206
Fvs1 0.196 0.501
Fvs2 0.510
Gt1 0.452
Gt2 0.543
Gtx1 0.180 0.509
Gtx2 0.114 0.566
Gx1 0.357
Gx2 0.478
Gxt1 0.141 0.491
Gxt2 0.119 0.538
Hs1 0.447
Hs2 0.520
Hst1 -0.414
Hst2 -0.388
Ht1 0.507
Ht2 0.494 0.113
Hts1 -0.361
Hts2 -0.386
Ic1 0.492
Ic2 0.547
Icp1 0.437
Icp2 0.145 0.431
Ik1 0.429
Ik2 0.491
Iks1 0.469
Iks2 0.536
Ip1 0.298
Ip2 0.492
Ip3 0.385
Ip4 0.507
Ip5 0.506
Ip6 0.435 0.137
Ipc1 0.117 0.506
Ipc2 0.155 0.498
Ipv1 0.382
Ipv2 0.451
Ipx1 0.156 0.464
Ipx2 0.166 0.481
Is1 0.472 0.186
Is2 0.415 0.189
Is3 0.556
Is4 0.513 0.116
Isk1 0.155 0.371
Isk2 0.512
Ist1 -0.389
Ist2 -0.397
It1 0.501
It2 0.404
Its1 -0.349
Its2 -0.490
Iv1 0.575
Iv2 0.538
Ivp1 0.475
Ivp2 0.435
Ix1 0.482
Ix2 0.394 0.175
Ixp1 0.162 0.496
Ixp2 0.528
Jk1 0.374 0.116
Jk2 0.469 0.107
Jkx1 0.156 0.520
Jkx2 0.104 0.518
Jx1 0.512
Jx2 0.383 0.144
Jxk1 0.128 0.452
Jxk2 0.518
Kk1 0.493 0.105
Kk2 0.610
Kkl1 -0.153 0.264
Kkl2 -0.250 0.178
Kl1 0.535
Kl2 0.557
Klk1 -0.349 0.197
Klk2 -0.321 0.159
Lk1 0.480 0.127
Lk2 0.570
Lkl1 0.437
Lkl2 -0.144 0.429
Ll1 0.451 0.173
Ll2 0.521
Ll3 0.571
Ll4 0.489
Llk1 -0.133 0.345
Llk2 -0.141 0.371
Llp1 -0.386 0.145
Llp2 -0.301 0.228
Lp1 0.387 0.127
Lp2 0.510
Lp3 0.544
Lp4 0.528
Lpl1 -0.285 0.105
Lpl2 -0.311 0.101
Lpv1 -0.169 0.467
Lpv2 -0.154 0.489
Lv1 0.537
Lv2 0.529
Lvp1 -0.114 0.458
Lvp2 -0.180 0.515
Mc1 0.276
Mc2 0.341 -0.162
Mc3 0.361 -0.128
Mc4 0.218 -0.173
Mch1 -0.141 0.303
Mch2 0.342
Mcl1 0.183 0.489
Mcl2 0.151 0.536
Mh1 0.459
Mh2 0.483
Mh3 0.483
Mh4 0.463
Mhc1 0.290
Mhc2 -0.101 0.366
Mhl1 0.122 0.495
Mhl2 0.131 0.595
Ml1 0.549
Ml2 0.518
Ml3 0.501
Ml4 0.396
Mlc1 0.313 0.422
Mlc2 0.576
Mlh1 0.184 0.424
Mlh2 0.230 0.504
Nk1 0.471 0.103
Nk2 0.425 0.128
Nkp1 0.408
Nkp2 0.452
Np1 0.543
Np2 0.407
Npk1 -0.118 0.411
Npk2 0.364
Factor1 Factor2
SS loadings 28.929 19.434
Proportion Var 0.139 0.093
Cumulative Var 0.139 0.233
It can be seen that:
Thus, PC1 basically opposes the “weird” items to all the other items, while PC2 opposes the ‘same’ to the ‘different’ items. Likewise, it seems that FA1 really captures the ‘same’ items and the “weird” ‘different’ items (but with opposite signs), while FA2 captures the “normal” ‘different’ items.
These observations prompt the question: are the “weird” items really of the ‘same’ type?
All these together suggest the hypothesis that the “weird” stimuli, while designed as ‘different’, are, in fact, perceived as (albeit rather difficult) ‘same’ items by the participants. If true, this hypothesis implies that coding them as such (i.e., in fact flipping their “correct” and “incorrect” responses) should align them with the other ‘same’ items.
Figure 31. ‘Flipped’ weird items: Endorsement
frequencies by item (items ordered by % correct responses). Figure
generated using R
version 4.3.3 (2024-02-29)
Figure 32. ‘Flipped’ weird items: Correlation matrix
between items. Figure generated using R version 4.3.3
(2024-02-29)
Figure 33. Histogram of the tetracoric correlations
between different itmes. Figure generated using R version 4.3.3
(2024-02-29)
Figure 34. ‘Flipped’ weird items: Hierarchical
clustering of the items using 1 - tetrachoric correlations. Figure
generated using R
version 4.3.3 (2024-02-29)
Figure 35. ‘Flipped’ weird items: Mean correlation with
the other items. Figure generated using R version 4.3.3
(2024-02-29)
For PCA, PC1 explains 20.1% of the variance, followed by PC2 which explains 5.5%, suggesting there is a main factor on which most items load, but the story is bit more complex, with at least a 2nd factor needed (and a lot of variation remaining unexplained by these 2 components):
Figure 36. ‘Flipped’ weird items: Screeplot of the PCA
of all the tone items together. Figure generated using R version 4.3.3
(2024-02-29)
Figure 37. ‘Flipped’ weird items: Loading of the tone
items on the first 2 PCs. Figure generated using R version 4.3.3
(2024-02-29)
Figure 38. ‘Flipped’ weird items: The participants
plotted on the first 2 PCs, colored by their their qualities of
representation (cos2). Figure generated using R version 4.3.3
(2024-02-29)
The actual loadings on the first two PCs are:
| PC1 | PC2 | |
|---|---|---|
| Ak1 | -0.07 | -0.05 |
| Ak2 | -0.08 | -0.06 |
| Akt1 | -0.07 | 0.09 |
| Akt2 | -0.07 | 0.09 |
| At1 | -0.08 | -0.03 |
| At2 | -0.08 | -0.04 |
| Atk1 | -0.06 | 0.10 |
| Atk2 | -0.06 | 0.09 |
| Bh1 | -0.08 | -0.03 |
| Bh2 | -0.07 | -0.04 |
| Bhx1 | -0.05 | -0.06 |
| Bhx2 | -0.04 | -0.06 |
| Bs1 | -0.09 | -0.04 |
| Bs2 | -0.09 | -0.05 |
| Bsv1 | -0.07 | 0.08 |
| Bsv2 | -0.07 | 0.10 |
| Bv1 | -0.08 | -0.05 |
| Bv2 | -0.08 | -0.04 |
| Bvs1 | -0.07 | 0.08 |
| Bvs2 | -0.08 | 0.08 |
| Bx1 | -0.07 | -0.03 |
| Bx2 | -0.10 | -0.04 |
| Bxh1 | -0.06 | -0.01 |
| Bxh2 | -0.04 | -0.04 |
| Ch1 | -0.08 | -0.07 |
| Ch2 | -0.10 | -0.05 |
| Chv1 | -0.04 | 0.09 |
| Chv2 | -0.05 | 0.09 |
| Cv1 | -0.07 | -0.06 |
| Cv2 | -0.07 | -0.04 |
| Cvh1 | -0.07 | 0.09 |
| Cvh2 | -0.07 | 0.07 |
| Dc1 | -0.07 | -0.02 |
| Dc2 | -0.07 | -0.07 |
| Dc3 | -0.09 | -0.07 |
| Dc4 | -0.08 | -0.05 |
| Dch1 | -0.01 | 0.10 |
| Dch2 | -0.03 | 0.09 |
| Dcx1 | -0.05 | 0.10 |
| Dcx2 | -0.05 | 0.10 |
| Dh1 | -0.07 | -0.04 |
| Dh2 | -0.06 | -0.07 |
| Dhc1 | 0.01 | 0.08 |
| Dhc2 | 0.00 | 0.10 |
| Dx1 | -0.07 | -0.07 |
| Dx2 | -0.08 | -0.06 |
| Dxc1 | -0.06 | 0.09 |
| Dxc2 | -0.05 | 0.10 |
| Ec1 | -0.08 | -0.06 |
| Ec2 | -0.08 | -0.05 |
| Ec3 | -0.07 | -0.06 |
| Ec4 | -0.08 | -0.08 |
| Ecl1 | -0.08 | 0.08 |
| Ecl2 | -0.08 | 0.09 |
| Ecx1 | -0.05 | 0.08 |
| Ecx2 | -0.05 | 0.10 |
| El1 | -0.07 | -0.06 |
| El2 | -0.08 | -0.07 |
| El3 | -0.07 | -0.06 |
| El4 | -0.08 | -0.08 |
| Elc1 | -0.09 | 0.08 |
| Elc2 | -0.07 | 0.10 |
| Elx1 | -0.09 | 0.05 |
| Elx2 | -0.09 | 0.07 |
| Ex1 | -0.08 | -0.02 |
| Ex2 | -0.06 | -0.04 |
| Ex3 | -0.09 | -0.09 |
| Ex4 | -0.08 | -0.06 |
| Exc1 | -0.04 | 0.09 |
| Exc2 | -0.04 | 0.13 |
| Exl1 | -0.09 | 0.07 |
| Exl2 | -0.10 | 0.06 |
| Fs1 | -0.09 | -0.05 |
| Fs2 | -0.09 | -0.03 |
| Fsv1 | -0.08 | 0.07 |
| Fsv2 | -0.08 | 0.09 |
| Fv1 | -0.09 | -0.03 |
| Fv2 | -0.08 | -0.01 |
| Fvs1 | -0.09 | 0.07 |
| Fvs2 | -0.07 | 0.09 |
| Gt1 | -0.07 | -0.04 |
| Gt2 | -0.08 | -0.05 |
| Gtx1 | -0.09 | 0.07 |
| Gtx2 | -0.08 | 0.09 |
| Gx1 | -0.06 | -0.03 |
| Gx2 | -0.08 | -0.04 |
| Gxt1 | -0.08 | 0.08 |
| Gxt2 | -0.08 | 0.09 |
| Hs1 | -0.07 | -0.04 |
| Hs2 | -0.07 | -0.06 |
| Hst1 | -0.05 | -0.06 |
| Hst2 | -0.05 | -0.07 |
| Ht1 | -0.08 | -0.05 |
| Ht2 | -0.08 | -0.04 |
| Hts1 | -0.05 | -0.05 |
| Hts2 | -0.05 | -0.06 |
| Ic1 | -0.07 | -0.05 |
| Ic2 | -0.09 | -0.05 |
| Icp1 | -0.06 | 0.08 |
| Icp2 | -0.07 | 0.07 |
| Ik1 | -0.07 | -0.04 |
| Ik2 | -0.08 | -0.05 |
| Iks1 | -0.06 | 0.09 |
| Iks2 | -0.07 | 0.09 |
| Ip1 | -0.05 | -0.02 |
| Ip2 | -0.08 | -0.05 |
| Ip3 | -0.05 | -0.05 |
| Ip4 | -0.08 | -0.05 |
| Ip5 | -0.08 | -0.06 |
| Ip6 | -0.08 | -0.03 |
| Ipc1 | -0.08 | 0.08 |
| Ipc2 | -0.08 | 0.08 |
| Ipv1 | -0.05 | 0.08 |
| Ipv2 | -0.05 | 0.10 |
| Ipx1 | -0.08 | 0.07 |
| Ipx2 | -0.08 | 0.07 |
| Is1 | -0.09 | -0.02 |
| Is2 | -0.08 | -0.01 |
| Is3 | -0.09 | -0.05 |
| Is4 | -0.09 | -0.04 |
| Isk1 | -0.07 | 0.05 |
| Isk2 | -0.07 | 0.09 |
| Ist1 | -0.05 | -0.05 |
| Ist2 | -0.05 | -0.06 |
| It1 | -0.08 | -0.05 |
| It2 | -0.07 | -0.03 |
| Its1 | -0.04 | -0.05 |
| Its2 | -0.07 | -0.07 |
| Iv1 | -0.09 | -0.05 |
| Iv2 | -0.08 | -0.06 |
| Ivp1 | -0.05 | 0.10 |
| Ivp2 | -0.05 | 0.09 |
| Ix1 | -0.07 | -0.06 |
| Ix2 | -0.08 | -0.02 |
| Ixp1 | -0.08 | 0.07 |
| Ixp2 | -0.07 | 0.09 |
| Jk1 | -0.07 | -0.02 |
| Jk2 | -0.08 | -0.04 |
| Jkx1 | -0.08 | 0.08 |
| Jkx2 | -0.08 | 0.08 |
| Jx1 | -0.08 | -0.05 |
| Jx2 | -0.07 | -0.02 |
| Jxk1 | -0.07 | 0.07 |
| Jxk2 | -0.07 | 0.09 |
| Kk1 | -0.08 | -0.04 |
| Kk2 | -0.08 | -0.08 |
| Kkl1 | 0.01 | -0.08 |
| Kkl2 | -0.01 | -0.07 |
| Kl1 | -0.08 | -0.06 |
| Kl2 | -0.09 | -0.05 |
| Klk1 | -0.03 | -0.09 |
| Klk2 | -0.03 | -0.08 |
| Lk1 | -0.08 | -0.03 |
| Lk2 | -0.09 | -0.06 |
| Lkl1 | -0.04 | 0.10 |
| Lkl2 | -0.03 | 0.11 |
| Ll1 | -0.09 | -0.02 |
| Ll2 | -0.07 | -0.06 |
| Ll3 | -0.08 | -0.07 |
| Ll4 | -0.07 | -0.06 |
| Llk1 | -0.02 | 0.09 |
| Llk2 | -0.02 | 0.10 |
| Llp1 | -0.04 | -0.08 |
| Llp2 | -0.02 | -0.08 |
| Lp1 | -0.07 | -0.02 |
| Lp2 | -0.08 | -0.05 |
| Lp3 | -0.08 | -0.06 |
| Lp4 | -0.09 | -0.05 |
| Lpl1 | -0.03 | -0.06 |
| Lpl2 | -0.03 | -0.06 |
| Lpv1 | -0.03 | 0.12 |
| Lpv2 | -0.04 | 0.12 |
| Lv1 | -0.07 | -0.07 |
| Lv2 | -0.08 | -0.06 |
| Lvp1 | -0.04 | 0.11 |
| Lvp2 | -0.04 | 0.13 |
| Mc1 | -0.03 | -0.05 |
| Mc2 | -0.03 | -0.08 |
| Mc3 | -0.04 | -0.07 |
| Mc4 | -0.01 | -0.07 |
| Mch1 | -0.02 | 0.08 |
| Mch2 | -0.04 | 0.08 |
| Mcl1 | -0.08 | 0.07 |
| Mcl2 | -0.09 | 0.08 |
| Mh1 | -0.06 | -0.06 |
| Mh2 | -0.07 | -0.05 |
| Mh3 | -0.07 | -0.05 |
| Mh4 | -0.07 | -0.05 |
| Mhc1 | -0.03 | 0.07 |
| Mhc2 | -0.03 | 0.09 |
| Mhl1 | -0.08 | 0.08 |
| Mhl2 | -0.09 | 0.10 |
| Ml1 | -0.08 | -0.06 |
| Ml2 | -0.07 | -0.07 |
| Ml3 | -0.07 | -0.06 |
| Ml4 | -0.05 | -0.05 |
| Mlc1 | -0.09 | 0.04 |
| Mlc2 | -0.08 | 0.10 |
| Mlh1 | -0.08 | 0.06 |
| Mlh2 | -0.09 | 0.07 |
| Nk1 | -0.08 | -0.04 |
| Nk2 | -0.08 | -0.03 |
| Nkp1 | -0.04 | 0.09 |
| Nkp2 | -0.04 | 0.10 |
| Np1 | -0.09 | -0.05 |
| Np2 | -0.07 | -0.04 |
| Npk1 | -0.03 | 0.10 |
| Npk2 | -0.03 | 0.09 |
It can be seen that:
For EFA, all the preliminary tests suggest that factor analysis is appropriate, with the possible exception of a determinant very close to 0.0 (Kaiser-Meyer-Olkin = 0.89 > 0.60; Bartlett’s test is significant: χ2(21528)=61167.2, p=0; and det(cor(data))=7.5e-64 > 0). However, when it comes to the best number of factors, the various methods diverge, but the overall story seems to be that 1 or 2 factors might be enough (but there seems to be a lot of variation beyond this as well, just as in the case of the PCA):
Figure 39. ‘Flipped’ weird items: Screeplot of the
observed, simulated and randomized data with 1 standard deviation error
bars (as generated by fa.parallel()). Figure generated
using R version
4.3.3 (2024-02-29)
Figure 40. ‘Flipped’ weird items: Number of factors as
suggested by the VSS criterion (top left), the complexity of the
solution (top right), BIC (bottom left) and Root Mean Residual (bottom
right), as implemented by nfactors(). Figure generated
using R version
4.3.3 (2024-02-29)
Figure 41. ‘Flipped’ weird items: Loadings of the
variables in the 2-factors model. Figure generated using R version 4.3.3
(2024-02-29)
and the actual loadings on the two factors are (showing only those ≥ 0.1 in absolute value):
Loadings:
Factor1 Factor2
Ak1 0.461
Ak2 0.516
Akt1 0.499
Akt2 0.522
At1 0.482 0.126
At2 0.472
Atk1 0.512
Atk2 0.477
Bh1 0.450 0.139
Bh2 0.447
Bhx1 0.376
Bhx2 0.373
Bs1 0.535 0.128
Bs2 0.570
Bsv1 0.495
Bsv2 0.533
Bv1 0.493
Bv2 0.482
Bvs1 0.118 0.483
Bvs2 0.116 0.513
Bx1 0.416
Bx2 0.569 0.130
Bxh1 0.309 0.129
Bxh2 0.291
Ch1 0.557
Ch2 0.585 0.109
Chv1 0.406
Chv2 0.426
Cv1 0.489
Cv2 0.459
Cvh1 0.479
Cvh2 0.127 0.420
Dc1 0.392 0.128
Dc2 0.530
Dc3 0.638
Dc4 0.523
Dch1 -0.216 0.338
Dch2 0.362
Dcx1 0.479
Dcx2 0.467
Dh1 0.442
Dh2 0.469
Dhc1 -0.246 0.230
Dhc2 -0.252 0.327
Dx1 0.531
Dx2 0.536
Dxc1 0.476
Dxc2 0.475
Ec1 0.507
Ec2 0.506
Ec3 0.497
Ec4 0.628
Ecl1 0.144 0.493
Ecl2 0.108 0.514
Ecx1 0.399
Ecx2 0.464
El1 0.469
El2 0.565
El3 0.506
El4 0.578
Elc1 0.173 0.528
Elc2 0.564
Elx1 0.275 0.437
Elx2 0.206 0.511
Ex1 0.406 0.139
Ex2 0.390
Ex3 0.647
Ex4 0.534
Exc1 0.420
Exc2 -0.167 0.513
Exl1 0.205 0.490
Exl2 0.264 0.495
Fs1 0.557
Fs2 0.516 0.153
Fsv1 0.164 0.461
Fsv2 0.114 0.543
Fv1 0.504 0.144
Fv2 0.369 0.206
Fvs1 0.196 0.501
Fvs2 0.510
Gt1 0.452
Gt2 0.543
Gtx1 0.180 0.509
Gtx2 0.114 0.566
Gx1 0.357
Gx2 0.478
Gxt1 0.141 0.491
Gxt2 0.119 0.538
Hs1 0.447
Hs2 0.520
Hst1 0.414
Hst2 0.388
Ht1 0.507
Ht2 0.494 0.113
Hts1 0.361
Hts2 0.386
Ic1 0.492
Ic2 0.547
Icp1 0.437
Icp2 0.145 0.431
Ik1 0.429
Ik2 0.491
Iks1 0.469
Iks2 0.536
Ip1 0.298
Ip2 0.492
Ip3 0.385
Ip4 0.507
Ip5 0.506
Ip6 0.435 0.137
Ipc1 0.117 0.506
Ipc2 0.155 0.498
Ipv1 0.382
Ipv2 0.451
Ipx1 0.156 0.464
Ipx2 0.166 0.481
Is1 0.472 0.186
Is2 0.415 0.189
Is3 0.556
Is4 0.513 0.116
Isk1 0.155 0.371
Isk2 0.512
Ist1 0.389
Ist2 0.397
It1 0.501
It2 0.404
Its1 0.349
Its2 0.490
Iv1 0.575
Iv2 0.538
Ivp1 0.475
Ivp2 0.435
Ix1 0.482
Ix2 0.394 0.175
Ixp1 0.162 0.496
Ixp2 0.528
Jk1 0.374 0.116
Jk2 0.469 0.107
Jkx1 0.156 0.520
Jkx2 0.104 0.518
Jx1 0.512
Jx2 0.383 0.144
Jxk1 0.128 0.452
Jxk2 0.518
Kk1 0.493 0.105
Kk2 0.610
Kkl1 0.153 -0.264
Kkl2 0.250 -0.178
Kl1 0.535
Kl2 0.557
Klk1 0.349 -0.197
Klk2 0.321 -0.159
Lk1 0.480 0.127
Lk2 0.570
Lkl1 0.437
Lkl2 -0.144 0.429
Ll1 0.451 0.173
Ll2 0.521
Ll3 0.571
Ll4 0.489
Llk1 -0.133 0.345
Llk2 -0.141 0.371
Llp1 0.386 -0.145
Llp2 0.301 -0.228
Lp1 0.387 0.127
Lp2 0.510
Lp3 0.544
Lp4 0.528
Lpl1 0.285 -0.105
Lpl2 0.311 -0.101
Lpv1 -0.169 0.467
Lpv2 -0.154 0.489
Lv1 0.537
Lv2 0.529
Lvp1 -0.114 0.458
Lvp2 -0.180 0.515
Mc1 0.276
Mc2 0.341 -0.162
Mc3 0.361 -0.128
Mc4 0.218 -0.173
Mch1 -0.141 0.303
Mch2 0.342
Mcl1 0.183 0.489
Mcl2 0.151 0.536
Mh1 0.459
Mh2 0.483
Mh3 0.483
Mh4 0.463
Mhc1 0.290
Mhc2 -0.101 0.366
Mhl1 0.122 0.495
Mhl2 0.131 0.595
Ml1 0.549
Ml2 0.518
Ml3 0.501
Ml4 0.396
Mlc1 0.313 0.422
Mlc2 0.576
Mlh1 0.184 0.424
Mlh2 0.230 0.504
Nk1 0.471 0.103
Nk2 0.425 0.128
Nkp1 0.408
Nkp2 0.452
Np1 0.543
Np2 0.407
Npk1 -0.118 0.411
Npk2 0.364
Factor1 Factor2
SS loadings 28.929 19.434
Proportion Var 0.139 0.093
Cumulative Var 0.139 0.233
It can be seen that:
Thus, it seems that the main difference now is between the (“extended”) ‘same’ and the ‘different’ items, which seems to make much more sense.
| Row.names | PC1.original | PC1.flipped | PC2.original | PC2.flipped |
|---|---|---|---|---|
| Bhx1 | 0.05 | -0.05 | 0.06 | -0.06 |
| Bhx2 | 0.04 | -0.04 | 0.06 | -0.06 |
| Bxh1 | 0.06 | -0.06 | 0.01 | -0.01 |
| Bxh2 | 0.04 | -0.04 | 0.04 | -0.04 |
| Hst1 | 0.05 | -0.05 | 0.06 | -0.06 |
| Hst2 | 0.05 | -0.05 | 0.07 | -0.07 |
| Hts1 | 0.05 | -0.05 | 0.05 | -0.05 |
| Hts2 | 0.05 | -0.05 | 0.06 | -0.06 |
| Ist1 | 0.05 | -0.05 | 0.05 | -0.05 |
| Ist2 | 0.05 | -0.05 | 0.06 | -0.06 |
| Its1 | 0.04 | -0.04 | 0.05 | -0.05 |
| Its2 | 0.07 | -0.07 | 0.07 | -0.07 |
| Kkl1 | -0.01 | 0.01 | 0.08 | -0.08 |
| Kkl2 | 0.01 | -0.01 | 0.07 | -0.07 |
| Klk1 | 0.03 | -0.03 | 0.09 | -0.09 |
| Klk2 | 0.03 | -0.03 | 0.08 | -0.08 |
| Llp1 | 0.04 | -0.04 | 0.08 | -0.08 |
| Llp2 | 0.02 | -0.02 | 0.08 | -0.08 |
| Lpl1 | 0.03 | -0.03 | 0.06 | -0.06 |
| Lpl2 | 0.03 | -0.03 | 0.06 | -0.06 |
| Row.names | Factor1.original | Factor1.flipped | Factor2.original | Factor2.flipped |
|---|---|---|---|---|
| Bhx1 | -0.38 | 0.38 | 0.07 | -0.07 |
| Bhx2 | -0.37 | 0.37 | 0.08 | -0.08 |
| Bxh1 | -0.31 | 0.31 | -0.13 | 0.13 |
| Bxh2 | -0.29 | 0.29 | 0.02 | -0.02 |
| Hst1 | -0.41 | 0.41 | 0.05 | -0.05 |
| Hst2 | -0.39 | 0.39 | 0.09 | -0.09 |
| Hts1 | -0.36 | 0.36 | 0.03 | -0.03 |
| Hts2 | -0.39 | 0.39 | 0.06 | -0.06 |
| Ist1 | -0.39 | 0.39 | 0.01 | -0.01 |
| Ist2 | -0.40 | 0.40 | 0.02 | -0.02 |
| Its1 | -0.35 | 0.35 | 0.05 | -0.05 |
| Its2 | -0.49 | 0.49 | 0.04 | -0.04 |
| Kkl1 | -0.15 | 0.15 | 0.26 | -0.26 |
| Kkl2 | -0.25 | 0.25 | 0.18 | -0.18 |
| Klk1 | -0.35 | 0.35 | 0.20 | -0.20 |
| Klk2 | -0.32 | 0.32 | 0.16 | -0.16 |
| Llp1 | -0.39 | 0.39 | 0.15 | -0.15 |
| Llp2 | -0.30 | 0.30 | 0.23 | -0.23 |
| Lpl1 | -0.29 | 0.29 | 0.11 | -0.11 |
| Lpl2 | -0.31 | 0.31 | 0.10 | -0.10 |
Returning to the Mokken analysis:
The complete item set has a homogeneity value H (se, 95%CI) of 0.223, (0.008), [0.207, 0.238]: this is significantly lower than the recommended 0.30, suggesting that the scale is not homogeneous. This is further supported by the fact that few items have a homogeneity around or above this value (only 19 if we consider the point estimate, and 93 out of 208 if we consider a 95%CI with an upper limit above 0.3). Interestingly, the homogeneity of related items (different presentations and different orders of the tones) are overall very similar, suggesting again that this an intrinsic property of the segments and tone(s) and not of their repeated presentation of order of tones.
The results are slightly better, but still far from ideal for IRT…
So, it seems that the “weird” stimuli Bhx, Hst, Ist, Kkl1 and Llp (all presentations and variants) are perceived by the participants as (rather difficult) ‘same’-type items and not as the intended ‘different’-type items. With this change, the items seem to fall into the two natural classes ‘same’ vs ‘different’ (even if there is a lot of unaccounted variation).
The MSA suggests that there are too many items, and it seems clear that the successive presentations of the same item do not seem to make a difference and, for the (by design) ‘different’ items, the order of the tones does not seem to mater as well. If this is so, we can reduce the items set by:
With these:
The complete item set has a homogeneity value H (se, 95%CI) of 0.230, (0.010), [0.210, 0.249]: this is significantly lower than the recommended 0.30, suggesting that the scale is not homogeneous. This is further supported by the fact that few items have a homogeneity around or above this value (only 5 if we consider the point estimate, and 33 out of 66 if we consider a 95%CI with an upper limit above 0.3). Interestingly, the homogeneity of related items (different presentations and different orders of the tones) are overall very similar, suggesting again that this an intrinsic property of the segments and tone(s) and not of their repeated presentation of order of tones.
Let’s iteratively remove the unscalable items for c ≤ 0.30:
After removing 21 items (Mch1, Chv1, Kkl1, Mc1, Ip1, Dch1, Ecx1, Hst1, Llp1, Dcx1, Lpv1, Gx1, Jk1, Icp1, Lkl1, Ix1, Lp1, Mh1, Ist1, It1, Iks1), we are left with the 45 items (Ak1, Akt1, At1, Bh1, Bhx1, Bs1, Bsv1, Bv1, Bx1, Ch1, Cv1, Dc1, Dh1, Dx1, Ec1, Ecl1, El1, Elx1, Ex1, Fs1, Fsv1, Fv1, Gt1, Gtx1, Hs1, Ht1, Ic1, Ik1, Ipv1, Ipx1, Is1, Iv1, Jkx1, Jx1, Kk1, Kl1, Lk1, Ll1, Lv1, Mcl1, Mhl1, Ml1, Nk1, Nkp1, Np1), covering both ‘same’ and ‘different’ items, which seem to form a single scale (more or less, especially at c = 0.30):
and the subscale’s H is now a much better 0.294, (0.013), [0.269, 0.320]:
| Item H | se | 95% ci | |
|---|---|---|---|
| Ak1 | 0.265 | (0.033) | [0.200, 0.330] |
| Akt1 | 0.263 | (0.037) | [0.191, 0.335] |
| At1 | 0.308 | (0.028) | [0.252, 0.363] |
| Bh1 | 0.310 | (0.031) | [0.250, 0.371] |
| Bhx1 | 0.235 | (0.040) | [0.156, 0.313] |
| Bs1 | 0.344 | (0.027) | [0.292, 0.397] |
| Bsv1 | 0.245 | (0.033) | [0.180, 0.311] |
| Bv1 | 0.306 | (0.036) | [0.236, 0.376] |
| Bx1 | 0.292 | (0.037) | [0.219, 0.364] |
| Ch1 | 0.299 | (0.030) | [0.240, 0.358] |
| Cv1 | 0.266 | (0.030) | [0.207, 0.324] |
| Dc1 | 0.292 | (0.029) | [0.235, 0.349] |
| Dh1 | 0.265 | (0.032) | [0.202, 0.329] |
| Dx1 | 0.290 | (0.033) | [0.225, 0.355] |
| Ec1 | 0.300 | (0.031) | [0.240, 0.360] |
| Ecl1 | 0.281 | (0.033) | [0.216, 0.345] |
| El1 | 0.268 | (0.035) | [0.200, 0.336] |
| Elx1 | 0.313 | (0.034) | [0.246, 0.380] |
| Ex1 | 0.308 | (0.036) | [0.237, 0.379] |
| Fs1 | 0.364 | (0.027) | [0.311, 0.416] |
| Fsv1 | 0.257 | (0.032) | [0.195, 0.320] |
| Fv1 | 0.334 | (0.030) | [0.275, 0.393] |
| Gt1 | 0.305 | (0.032) | [0.242, 0.368] |
| Gtx1 | 0.307 | (0.025) | [0.258, 0.355] |
| Hs1 | 0.320 | (0.032) | [0.257, 0.384] |
| Ht1 | 0.299 | (0.030) | [0.240, 0.358] |
| Ic1 | 0.268 | (0.032) | [0.205, 0.331] |
| Ik1 | 0.283 | (0.034) | [0.217, 0.348] |
| Ipv1 | 0.255 | (0.047) | [0.163, 0.346] |
| Ipx1 | 0.270 | (0.029) | [0.213, 0.326] |
| Is1 | 0.326 | (0.028) | [0.270, 0.381] |
| Iv1 | 0.340 | (0.027) | [0.287, 0.394] |
| Jkx1 | 0.283 | (0.030) | [0.224, 0.343] |
| Jx1 | 0.320 | (0.036) | [0.249, 0.391] |
| Kk1 | 0.316 | (0.030) | [0.257, 0.376] |
| Kl1 | 0.362 | (0.037) | [0.290, 0.434] |
| Lk1 | 0.306 | (0.029) | [0.249, 0.363] |
| Ll1 | 0.322 | (0.034) | [0.256, 0.388] |
| Lv1 | 0.278 | (0.033) | [0.213, 0.342] |
| Mcl1 | 0.301 | (0.027) | [0.247, 0.355] |
| Mhl1 | 0.260 | (0.031) | [0.200, 0.320] |
| Ml1 | 0.308 | (0.034) | [0.241, 0.375] |
| Nk1 | 0.305 | (0.030) | [0.247, 0.364] |
| Nkp1 | 0.232 | (0.042) | [0.149, 0.315] |
| Np1 | 0.321 | (0.031) | [0.261, 0.381] |
22 items do not meet the local independence criterion (Ak1, Akt1, Bhx1, Bsv1, Ch1, Cv1, Dh1, Ec1, Ecl1, Elx1, Fs1, Fsv1, Gt1, Gtx1, Ipv1, Ipx1, Iv1, Jkx1, Lk1, Mcl1, Mhl1, Nkp1) and are excluded from the analysis, leaving the 23 items At1, Bh1, Bs1, Bv1, Bx1, Dc1, Dx1, El1, Ex1, Fv1, Hs1, Ht1, Ic1, Ik1, Is1, Jx1, Kk1, Kl1, Ll1, Lv1, Ml1, Nk1, Np1.
Monotonicity tests for the remaining items are shown below for default minsize:
Invariant item ordering (IIO) tests are shown below for default minsize:
and the subscale’s H is now a much better 0.337, (0.024), [0.290, 0.384]:
| Item H | se | 95% ci | |
|---|---|---|---|
| At1 | 0.315 | (0.036) | [0.245, 0.386] |
| Bh1 | 0.349 | (0.038) | [0.275, 0.423] |
| Bs1 | 0.367 | (0.034) | [0.300, 0.434] |
| Bv1 | 0.340 | (0.041) | [0.260, 0.419] |
| Bx1 | 0.336 | (0.044) | [0.250, 0.423] |
| Dc1 | 0.326 | (0.039) | [0.250, 0.403] |
| Dx1 | 0.327 | (0.039) | [0.251, 0.404] |
| El1 | 0.292 | (0.041) | [0.212, 0.373] |
| Ex1 | 0.332 | (0.042) | [0.249, 0.414] |
| Fv1 | 0.366 | (0.036) | [0.296, 0.437] |
| Hs1 | 0.349 | (0.042) | [0.267, 0.431] |
| Ht1 | 0.339 | (0.037) | [0.267, 0.412] |
| Ic1 | 0.287 | (0.039) | [0.211, 0.363] |
| Ik1 | 0.309 | (0.040) | [0.232, 0.387] |
| Is1 | 0.333 | (0.034) | [0.266, 0.400] |
| Jx1 | 0.351 | (0.041) | [0.270, 0.432] |
| Kk1 | 0.354 | (0.038) | [0.280, 0.427] |
| Kl1 | 0.412 | (0.046) | [0.321, 0.503] |
| Ll1 | 0.341 | (0.039) | [0.265, 0.417] |
| Lv1 | 0.308 | (0.040) | [0.230, 0.386] |
| Ml1 | 0.344 | (0.039) | [0.267, 0.421] |
| Nk1 | 0.333 | (0.037) | [0.261, 0.405] |
| Np1 | 0.349 | (0.037) | [0.276, 0.421] |
However, it can be seen that this subscale is composed entirely of ‘same’ items, which suggests that IRT/Mokken are not appropriate for analyzing such data, but they were, nevertheless, extremely useful in helping to detect the “weird” items.
Given the above analyses, it seems that the optimal way forward is:
These three tone datasets all have the same number of participants (492, as we did not yet remove any outliers), but different number of items: 208 (original), 188 (removed), and 208 (recoded) respectively.
As a reminder, there are 20 “weird” items in total (Bhx1, Bhx2, Bxh1, Bxh2, Hst1, Hst2, Hts1, Hts2, Ist1, Ist2, Its1, Its2, Kkl1, Kkl2, Klk1, Klk2, Llp1, Llp2, Lpl1, Lpl2).
Figure 42. Histogram of the % correct on each of the
three datasets. Figure generated using R version 4.3.3
(2024-02-29)
| pc_tot_orig | pc_tot_remv | pc_tot_recd | |
|---|---|---|---|
| pc_tot_orig | 1 | 0.97 | 0.9 |
| pc_tot_remv | 0.97 | 1 | 0.97 |
| pc_tot_recd | 0.9 | 0.97 | 1 |
Figure 43. Heatmap with clustering for the Spearman
correlations between % correct total across the three datasets. Figure
generated using R
version 4.3.3 (2024-02-29)
The correlations between the total % correct across the three datasets are very large (≥0.90) suggesting that we can simply use any one of them.
Figure 44. Histogram of the % correct ‘same’ (top) and
‘different’ (bottom) on each of the three datasets. Figure generated
using R version
4.3.3 (2024-02-29)
| pc_same_orig | pc_diff_orig | pc_same_remv | pc_diff_remv | |
|---|---|---|---|---|
| pc_same_orig | 1 | 0.23 | 1 | 0.34 |
| pc_diff_orig | 0.23 | 1 | 0.23 | 0.97 |
| pc_same_remv | 1 | 0.23 | 1 | 0.34 |
| pc_diff_remv | 0.34 | 0.97 | 0.34 | 1 |
| pc_same_recd | 0.94 | 0.08 | 0.94 | 0.24 |
| pc_diff_recd | 0.34 | 0.97 | 0.34 | 1 |
| pc_same_recd | pc_diff_recd | |
|---|---|---|
| pc_same_orig | 0.94 | 0.34 |
| pc_diff_orig | 0.08 | 0.97 |
| pc_same_remv | 0.94 | 0.34 |
| pc_diff_remv | 0.24 | 1 |
| pc_same_recd | 1 | 0.24 |
| pc_diff_recd | 0.24 | 1 |
Figure 45. Heatmap with clustering for the Spearman
correlations between % correct for ‘same’ and ‘different’ itmes across
the three datasets. Figure generated using R version 4.3.3
(2024-02-29)
It is a similar story here, with the % correct for the ‘same’ items being highly intercorrelated (≥0.94) as are those for the ‘different’ items (≥0.97), but the correlations between these two types are low (between 0.09 and 0.35) with the lowest for the ‘original’ dataset; therefore we will use the results for the ‘remove’ (or the ‘recode’) dataset.
We estimate the following measures (using
psycho::dprime(), see Pallier
(2002), https://bookdown.org/danbarch/psy_207_advanced_stats_I/signal-detection-theory.html
and https://www.birmingham.ac.uk/Documents/college-les/psych/vision-laboratory/sdtintro.pdf
for details):
Figure 46. Histogram of the % correct ‘same’ (top) and
‘different’ (bottom) on each of the three datasets. Figure generated
using R version
4.3.3 (2024-02-29)
| dprime_orig | beta_orig | c_orig | aprime_orig | bppd_orig | |
|---|---|---|---|---|---|
| dprime_orig | 1 | 0.8 | 0.55 | 0.92 | 0.8 |
| beta_orig | 0.8 | 1 | 0.9 | 0.54 | 1 |
| c_orig | 0.55 | 0.9 | 1 | 0.29 | 0.91 |
| aprime_orig | 0.92 | 0.54 | 0.29 | 1 | 0.55 |
| bppd_orig | 0.8 | 1 | 0.91 | 0.55 | 1 |
| dprime_remv | 0.98 | 0.71 | 0.46 | 0.95 | 0.71 |
| beta_remv | 0.51 | 0.89 | 0.91 | 0.21 | 0.89 |
| c_remv | 0.28 | 0.74 | 0.92 | -0.01 | 0.75 |
| aprime_remv | 0.91 | 0.55 | 0.3 | 0.98 | 0.56 |
| bppd_remv | 0.53 | 0.89 | 0.92 | 0.23 | 0.9 |
| dprime_recd | 0.93 | 0.69 | 0.46 | 0.9 | 0.69 |
| beta_recd | 0.26 | 0.74 | 0.84 | -0.06 | 0.73 |
| c_recd | 0.07 | 0.59 | 0.82 | -0.23 | 0.6 |
| aprime_recd | 0.9 | 0.59 | 0.35 | 0.95 | 0.6 |
| bppd_recd | 0.25 | 0.74 | 0.85 | -0.07 | 0.73 |
| dprime_remv | beta_remv | c_remv | aprime_remv | bppd_remv | |
|---|---|---|---|---|---|
| dprime_orig | 0.98 | 0.51 | 0.28 | 0.91 | 0.53 |
| beta_orig | 0.71 | 0.89 | 0.74 | 0.55 | 0.89 |
| c_orig | 0.46 | 0.91 | 0.92 | 0.3 | 0.92 |
| aprime_orig | 0.95 | 0.21 | -0.01 | 0.98 | 0.23 |
| bppd_orig | 0.71 | 0.89 | 0.75 | 0.56 | 0.9 |
| dprime_remv | 1 | 0.38 | 0.14 | 0.96 | 0.39 |
| beta_remv | 0.38 | 1 | 0.92 | 0.21 | 0.99 |
| c_remv | 0.14 | 0.92 | 1 | -0.02 | 0.92 |
| aprime_remv | 0.96 | 0.21 | -0.02 | 1 | 0.23 |
| bppd_remv | 0.39 | 0.99 | 0.92 | 0.23 | 1 |
| dprime_recd | 0.97 | 0.36 | 0.13 | 0.94 | 0.37 |
| beta_recd | 0.14 | 0.92 | 0.91 | -0.03 | 0.9 |
| c_recd | -0.05 | 0.81 | 0.95 | -0.21 | 0.81 |
| aprime_recd | 0.96 | 0.25 | 0.03 | 0.99 | 0.27 |
| bppd_recd | 0.13 | 0.91 | 0.92 | -0.04 | 0.9 |
| dprime_recd | beta_recd | c_recd | aprime_recd | bppd_recd | |
|---|---|---|---|---|---|
| dprime_orig | 0.93 | 0.26 | 0.07 | 0.9 | 0.25 |
| beta_orig | 0.69 | 0.74 | 0.59 | 0.59 | 0.74 |
| c_orig | 0.46 | 0.84 | 0.82 | 0.35 | 0.85 |
| aprime_orig | 0.9 | -0.06 | -0.23 | 0.95 | -0.07 |
| bppd_orig | 0.69 | 0.73 | 0.6 | 0.6 | 0.73 |
| dprime_remv | 0.97 | 0.14 | -0.05 | 0.96 | 0.13 |
| beta_remv | 0.36 | 0.92 | 0.81 | 0.25 | 0.91 |
| c_remv | 0.13 | 0.91 | 0.95 | 0.03 | 0.92 |
| aprime_remv | 0.94 | -0.03 | -0.21 | 0.99 | -0.04 |
| bppd_remv | 0.37 | 0.9 | 0.81 | 0.27 | 0.9 |
| dprime_recd | 1 | 0.18 | -0.01 | 0.97 | 0.17 |
| beta_recd | 0.18 | 1 | 0.93 | 0.05 | 1 |
| c_recd | -0.01 | 0.93 | 1 | -0.14 | 0.94 |
| aprime_recd | 0.97 | 0.05 | -0.14 | 1 | 0.04 |
| bppd_recd | 0.17 | 1 | 0.94 | 0.04 | 1 |
Figure 47. Heatmap with clustering for the Spearman
correlations between SDT sensitivity and bias measures across the three
datasets. Figure generated using R version 4.3.3
(2024-02-29)
First, the signal estimates d’ and A’ are very highly intercorrelated across datasets (≥0.93 for d’ and ≥0.95 for A’) and between them (between 0.91 and 0.96). Second, within each dataset, beta and c have high correlations of ~0.9. Between datasets, beta for ‘remove’ seems to have the best intercorrelations (0.92 with ‘recode’ and 0.89 with ‘original’) while the correlation for ‘original’ and ‘recode’ is 0.75; however, c shows much higher intercorrelations (between 0.82 and 0.95). B’‘D for ’original’ is moderately correlated with the other two (0.73 with ‘recode’ and 0.90 with ‘remove’), while these two correlate at 0.90. Within datasets, B’’D correlates very strongly with c (around 0.92) and almost perfectly with beta (≥0.99).
Coupled with the more natural interpretation of some of these estimates estimates, we will focus primarily on d’ and c, but also keep A’ and B’’D in mind.
From the above, it seems that either the ‘remove’ or the ‘recode’ datasets would be best if we had to use a single dataset, and there seems to be a slight data-driven advantage for the former (in the sense that it has the best correlations with the other two datasets), but at the cost of completely losing the information provided by the “weird” items – given these, we will continue using the three datasets for now. In terms of actual estimates, we will keep the % correct responses overall and for the ‘same’ and ‘different’ items separately, while for the Signal Detection Theory we will focus on d’ and c.
However, first we will detect those participants with very high biases (one way or another):
Figure 48. Histogram of the bias c on each of the three
datasets (repeated from above). Figure generated using R version 4.3.3
(2024-02-29)
It can be seen that in all three datasets there is an overall bias towards answering ‘same’ (c > 0, highly significant in all three cases), but much smaller (and statistically highly significantly so) for the ‘remove’ (0.3) and especially ‘recode’ (0.2) than for the ‘original’ (0.52) dataset.
Figure 49. Histogram of the bias c on each of the three
datasets. Figure generated using R version 4.3.3
(2024-02-29)
It is clear that there are some participants with strong biases in all three datasets; let’s identify them. First, there is one participant which systematically answered “same” irrespective of the item:
| age | gender | music_years | education_years | location |
|---|---|---|---|---|
| 51 | F | 0 | 2 | B |
Second, there are no other participants with strong biases (≤-2 or ≥2) in any dataset and only a few with c between -1 and 1, so let’ remove this one only.
Interestingly, while for the “original” dataset there are quite a few participants with biases |c|≥1 (2 with c≤-1 and 34 with c≥1), there are fewer for the “remove” (3 with c≤-1 and 14 with c≥1) and for the “recode” (3 with c≤-1 and 11 with c≥1) datasets, again supporting the idea that the “weird” items are treated as ‘same’ and not ‘different’.
Focusing on the “de-biased” signal sensitivity d’, it interesting to note that it is higher, on average, on the “recoded” (2.3) dataset than on the “removed” (2.49) dataset and than on the “original” (2.04) dataset (all differences are highly significant as judged with two-sample t-tests), again supporting the view that the “weird” items are ‘same’ and not ‘different’.
Let’s look at the relationships between estimates within datasets:
Figure 50. Heatmap with clustering for the Spearman
correlations between various measures in each dataset. Figure generated
using R version
4.3.3 (2024-02-29)
Figure 51. The three main measures of success on each
dataset. Figure generated using R version 4.3.3
(2024-02-29)
d’ and A’ are highly correlated (≥ 0.92), as expected, and each is highly correlated with the % total correct responses: d’ (between 0.87 and 0.96) and A’ is virtually perfectly correlated (≥0.97).
Keeping, removing or recoding the “weird” items all share the downside that they ignore the participants that seemingly answered correctly for these items. Therefore, we will consider these items as ‘different’ items, together with their associated ‘same’ items, separately.
Figure 52. Histograms of the measures on the ‘weird’
dataset. Figure generated using R version 4.3.3
(2024-02-29)
A expected (see Table below), there is a low overall % correct responses due to the very low % correct responses on the ‘different’ items, which is reflected in rather low sensitivities, with strong biases towards ‘same’ responses compared with the other three datasets:
Importantly, the maximum % correct is 94.2% overall, 100% for ‘same’ and 97.1% for ‘different’, suggesting that same participants seem to have answered correctly. Let’s see who these participants are and how their responses on the ‘non-weird’ items are like:
Figure 53. The measures of success for the ‘weird’ and
‘non-weird’ (‘remove’) items. Figure generated using R version 4.3.3
(2024-02-29)
And let’s look at their correlations:
Figure 54. Heatmap with clustering for the correlations
between all measures for the ‘weird’ and ‘non-weird’ (‘remove’ dataset)
items. Figure generated using R version 4.3.3
(2024-02-29)
The important relationships are those between the two sets (“weird” vs “non-weird” items) for:
% total correct responses: their correlation is 0.50 overall, but this is likely heavily biased by the ‘same’ items (which, as expected, behaves the same for the “weird” and “non-weird” items),
% correct for the ‘different’ items: while overall there is no correlation (-0.03, p>0.05), there seem to be two or three groups of participants (cluster::clusGap()
suggests 3 clusters, while NbClust::NbClust() finds that 8
methods suggest 2 clusters and 7 suggest 3):
Figure 55. K-means clustering of the participants using
the % correct for ‘different’ on the ‘weird’ and ‘non-weird’ items, with
k=2 (left) and k=3 (right), trying to keep the colors and symbols of the
corresponding clusters similar. Please note the order of the ‘numbers’
(i.e., ‘1’, ‘2’ and ‘3’) in the legend is arbitrary. Figure generated
using R version
4.3.3 (2024-02-29)
It can be seen that the 3-clusters solution is more or less splitting one of the clusters of the 2-clusters solution, with the rough correspondences between the k=2 and k=3 clusters 1(2) ≈ 1(3) [red in the figure] and 2(2) ≈ 2(3)+3(3) [shades of blue in the figure], where the first digit is the cluster ‘number’ (which is arbitrary) and in parentheses the number of clusters, k, so 1(2) means cluster ‘1’ of the k=2 solution. Within each cluster, we have the following correlations between the “weird” and “non-weird” % correct:
It can be seen that, while on the full dataset there basically is no correlation (Pearson’s seems affected by a few outliers), in clusters 1(2), 1(3), 2(2) and 3(3) there is a strong positive and highly significant correlation, while on cluster 2(3) there is no correlation.
For k=2, cluster 1(2) comprises only 95 or 19.4% participants which seem to respond at chance level to the “non-weird” ‘different’ items (between 7.1% and 81%, mean = 47.5% and median = 48.8%) and have low % for the “weird” items (between 0% and 70%, mean = 37% and median = 40%). Here, there is a significant positive relationship between the two % correct measures, the % correct of the “non-weird” items predicting very well that for the “weird” items (liner regression β=0.63±0.13, p=3.92×10-6).
The same pattern holds for the corresponding k=3, cluster 1(3): this comprises 86 or 17.6% participants which seem to respond at chance level to the “non-weird” ‘different’ items (between 7.1% and 65.5%, mean = 45.6% and median = 47%) and have low % for the “weird” items (between 0% and 70%, mean = 37.2% and median = 40%). Here, there is a significant positive relationship between the two % correct measures, the % correct of the “non-weird” items predicting very well that for the “weird” items (liner regression β=0.73±0.14, p=1.29×10-6).
For k=2, cluster 2(2) comprises the vast majority of the participants (394 or 80.6%) which have high % correct responses for the “non-weird” ‘different’ items (≥ 59.5%, mean = 86.6% and median = 88.1%) but mostly low % for the “weird” items (≤ 90%, mean = 19.1% and median = 15%). Here, there is a significant positive relationship between the two % correct measures, the % correct of the “non-weird” items predicting that for the “weird” items (liner regression β=0.60±0.08, p=1.66×10-13).
This group roughly splits in two for k=3: cluster 2(3) that comprises 86 or 17.6% the participants with relatively high % for the “non-weird” items (≥ 7.1%, mean = 45.6% and median = 47%) and very low for the “weird” items (≤ 70%, mean = 37.2% and median = 40%), basically those that clearly treated the “weird” items as being of the ‘same’ type. There is no relationship between the two % correct measures (β=0.15±0.14, p=0.281). The other part forms cluster 3(3) that comprises 265 or 54.2% the participants with high % for the “non-weird” items (≥ 51.2%, mean = 84.3% and median = 85.7%) and low for the “weird” items (≤ 25%, mean = 11% and median = 10%). However, here there is no relationship between the two % correct measures, with a significant positive prediction of % correct of the “weird” items from the “non-weird” ones (β=0.15±0.04, p=4.63×10-4).
Importantly, the % correct for the “weird” items (which are nominally ‘different’) have very strong and highly significant negative correlations with the % correct for the ‘same’ “non-weird” items, and their linear regressions have negative and slopes β of comparable size (-0.90 ≤ βs ≤ -0.50) on the whole dataset and in each cluster separately:
Figure 56. The % correct responses for the ‘weird’ (y
axis) vs the % correct responses foe the ‘same’ ‘non-weird’
(aka ‘remove’) items (x axis) overall (left), for k=2 (middle) and k=3
(right), keeping the same cluster colors and symbols as above. Figure
generated using R
version 4.3.3 (2024-02-29)
This is precisely the pattern to be expected if the “weird” items behave like the ‘same’ items: the participants with higher performance on the same items have lower performance on the “weird” items because their responses are in fact “flipped”.
pdf
2
pdf
2
It turns out that the tone task is far from simple, in particular, that the 5 items Bhx, Hst, Ist, Kkl1 and Llp (so-called “weird” items) clearly behave like the ‘same’ items and not as the other ‘different’ items. While this is interesting question from a phonological/phonetic point of view, we must leave this for future research.
Here, the relevant questions is “what to do with these items?”. There are three basic options: (1) leave them as they are (the “original” dataset), (2) simply drop them (the “removed” dataset), or (3) consider them as ‘same’ items and “flip” their responses (the “recode” dataset). From the analyses performed above and theoretical considerations, we will focus on the “recode” dataset but also keep the original dataset for reference.
With these we have the following primary and secondary measures:
Finally, please note that while d’ can arguably be modeled well using linear regression, for the % correct responses we have several potential choices:
R by, for example,
glmmTMB(..., family=beta_family()), standardly used to
model proportion data, but is has two drawbacks: it cannot deal with
proportions of exactly 0.0 and 1.0, requiring these to be converted to
something almost 0.0 and 1.0 respectively, and, for most participants,
it produces relatively hard-to-interpret estimates.R and most practitioners know how to interpret its
results.Therefore, we will systematically perform logistic regression on the counts of “0” (“incorrect”) and “1” (“correct”) responses, but we might still plot and show the % of correct responses when appropriate.
Pearson’s:
Figure 57. Heatmap with clustering for the Pearson’s
correlations between the conitnuous measures of interest. Figure
generated using R
version 4.3.3 (2024-02-29)
pdf
2
pdf
2
Spearman’s:
Figure 58. Heatmap with clustering for the Spearman’s
correlations between the conitnuous measures of interest. Figure
generated using R
version 4.3.3 (2024-02-29)
It can be seen that:
These bi-variate correlations, however, might hide more complex relationships between the measures of interest and covariates, that can be tested using multiple regression, mediation and (piecewise) path and structural models.
We use here the normalized working memory performance estimate (wm_norm).
There are no significant differences between the two main locations (A and B) in terms of the working memory task performance (linear regression βB-A=-0.0075, p=0.688), and, for those 91 participants with information about family relationships, the generation they belong to also has no effect (linear regression βold-young=-0.081, p=0.122), and, moreover, there is no clustering within families (the linear model with family as a random effect has an ICC of 2.4%), suggesting that we need not model these factors here.
A linear multiple linear regression of the working memory task performance on age, gender and years of education and all their interactions simplifies (using manual simplification based on F-test’s p value) to a model with main effects only:
Call:
lm(formula = wm_norm ~ age + gender + education_years, data = d)
Residuals:
Min 1Q Median 3Q Max
-0.45016 -0.09722 -0.00813 0.09104 0.50216
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.5539387 0.0365383 15.160 <2e-16 ***
age -0.0058270 0.0006713 -8.680 <2e-16 ***
genderM -0.0320631 0.0142637 -2.248 0.025 *
education_years 0.0206672 0.0020358 10.152 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1425 on 485 degrees of freedom
Multiple R-squared: 0.509, Adjusted R-squared: 0.506
F-statistic: 167.6 on 3 and 485 DF, p-value: < 2.2e-16
Figure 59. Untransformed slopes with standard errors.
Figure generated using R version 4.3.3
(2024-02-29)
suggesting that age has a negative effect, years of education a positive effect, and that males have worse performance than the females. However, it is possible that the causal model is more complex, with the effect of gender and age largely mediated by the years of education.
Indeed, fitting a mediation model where gender influences
working memory through years of education
(N.B. while the outcome model is a linear regression of working
memory performance on the mediator and treatment, the mediator model is
the Poisson regression of years of education on the treatment because
the years of education is a count variable) finds a highly significant
positive indirect effect (ACME), but also a significant direct effect
(ADE); as fitted using mediation::mediate():
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.07474 0.04916 0.10 <2e-16 ***
ADE -0.04001 -0.07098 -0.01 0.015 *
Total Effect 0.03473 -0.00461 0.07 0.084 .
Prop. Mediated 2.05904 -7.70715 13.87 0.084 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 489
Simulations: 10000
and as fitted using piecewiseSEM::psem():
Structural Equation Model of wm_task_results$med_gender__education$piecewise$model
Call:
education_years ~ gender_n
wm_norm ~ gender_n + education_years
AIC
2833.359
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years gender_n 0.3653 0.0373 487 9.7828 0.0000
wm_norm gender_n -0.0403 0.0153 486 -2.6356 0.0087
wm_norm education_years 0.0320 0.0017 486 19.1132 0.0000
Std.Estimate
0.2563 ***
-0.0932 **
0.6755 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.18
wm_norm none 0.43
Figure 60. Mediation model of gender,
years of education and working memory performance
showing the standardized coefficients (fitted using
piecewiseSEM::psem()). Figure generated using R version 4.3.3
(2024-02-29)
Moreover, testing this partial mediation model against the full mediation model (that does not include a direct effect but only the indirect effect) using d-separation and model comparison finds that while the effect of gender is mostly mediated through years of education (males have more and years of education are positively related to working memory), the direct effect of gender on working memory also matters (d-sep p=0.009, model comparison χ2(1)=6.9, p=0.008).
Likewise, fitting a mediation model where age influences
working memory through years of education finds a
highly significant negative direct effect (ADE), but also a significant
negative mediated effect (ACME); as fitted using
mediation::mediate():
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.01300 -0.02510 0.00 0.024 *
ADE -0.00594 -0.00754 0.00 <2e-16 ***
Total Effect -0.01894 -0.03072 -0.01 0.001 ***
Prop. Mediated 0.68258 0.22255 0.83 0.023 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 489
Simulations: 10000
and as fitted using piecewiseSEM::psem():
Structural Equation Model of wm_task_results$med_age__education$piecewise$model
Call:
education_years ~ age
wm_norm ~ age + education_years
AIC
2289.804
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years age -0.0326 0.0013 487 -24.2750 0
wm_norm age -0.0059 0.0007 486 -8.8119 0
wm_norm education_years 0.0196 0.0020 486 9.8602 0
Std.Estimate
-0.6394 ***
-0.3691 ***
0.4130 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.7
wm_norm none 0.5
Figure 61. Mediation model of age, years
of education and working memory performance showing the
standardized coefficients (fitted using
piecewiseSEM::psem()). Figure generated using R version 4.3.3
(2024-02-29)
Moreover, the effect of age is split between the direct negative effect and the mediated (negative effect on years of education) effect (d-sep p=2.19×10-17, model comparison χ2(1)=72.5, p=0).
The working memory performance is not influenced by the location (as expected) and does not cluster within families, but is strongly positively correlated with years of education (causality is unclear but could be bidirectional), is strongly negatively influenced by age both directly and indirectly by negatively influencing the years of education, and by gender, mainly indirectly as the males have more years of education, but also directly, as the males seem to have slightly worse performance than the females.
Given that the tone task results can be potentially analyses in several ways, we conducted a preliminary comparison to decide what would be the best approach here:
... + (1 | participant) + (1 | item)) which keeps the
advantages of (a) but properly deal with its main shortcoming, andWe compared these on our data and we found that:
Therefore, we will perform the beta regression of the % correct responses, using a mixed-effects logistic regression as sanity check in some cases.
We focus here on the % total correct responses on the ‘recoded’
dataset estimate (pcr = percent correct
recoded). Given that these are percents bounded, by definition,
between 0% and 100%, we used beta regression (as implemented by
glmmTMB::glmmTMB()); however, the mediation modelling
function mediation::mediate has troubles with beta
regression, as has piecewiseSEM::psem() when it comes to
fitting the full mediation model, so we employed in these cases the
equivalent linear regressions (the relevant coefficient estimates and
p-values are similar enough between the two to ensure that the
qualitative conclusions hold). Moreover, piecewiseSEM has
troubles estimating the standardized path coefficients, so we report
here the unstandardized ones.
There is a highly significant difference between the two main locations (A has higher overall performance than B; beta regression βB-A=-0.45, p=1.45×10-9). For those 91 participants with information about family relationships, the generation they belong to has no effect (beta regression βold-young=-0.019, p=0.931). However, there is a slight clustering within families (the linear model with family as a random effect has an ICC of 9.9%, and including family as a fixed effect in a “flat” beta regression vs excluding it results in a p=0.016), but given the loss of sample size and the relatively small amount of variation explained we will ignore it here.
A beta multiple linear regression on working memory, age, gender, years of education and location and all their interactions simplifies (using manual simplification based on F-test’s p value) to a model with main effects and two 2-way interactions only:
Family: beta ( logit )
Formula:
pcr ~ age + gender + education_years + location_ab + wm_norm +
age:gender + age:education_years
Data: d
AIC BIC logLik deviance df.resid
-940.6 -903.3 479.3 -958.6 454
Dispersion parameter for beta family (): 11.5
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.6384067 0.4358957 3.759 0.000171 ***
age -0.0147449 0.0084924 -1.736 0.082518 .
genderM -0.6487713 0.2521834 -2.573 0.010093 *
education_years -0.0640475 0.0389790 -1.643 0.100356
location_abB -0.3905770 0.0691881 -5.645 1.65e-08 ***
wm_norm 1.4110569 0.2426382 5.815 6.05e-09 ***
age:genderM 0.0171747 0.0062264 2.758 0.005809 **
age:education_years 0.0030296 0.0008416 3.600 0.000318 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Figure 62. Untransformed slopes with standard errors.
Figure generated using R version 4.3.3
(2024-02-29)
Figure 63. Predicted values of pcr showing the
interaction ofgender and age. Figure generated using
R version 4.3.3
(2024-02-29)
Figure 64. Predicted values of pcr showing the
interaction of education_years and age. Figure
generated using R
version 4.3.3 (2024-02-29)
suggesting that age and years of education have no main effects, that working memory does have a positive effect, and that males have better performance than the females but there is an interaction between education and age, and between age and gender, and that participants from A have better performance than those from B. As above, we test more complex of causality using mediation and path analysis.
pdf
2
pdf
2
The mediation model where gender influences tone
through years of education finds a highly significant positive
indirect effect (ACME) and no direct effect (ADE); as fitted using
mediation::mediate() (with linear regression):
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.03816 0.02488 0.05 <2e-16 ***
ADE -0.00215 -0.02616 0.02 0.8598
Total Effect 0.03600 0.00838 0.06 0.0082 **
Prop. Mediated 1.05577 0.61086 3.48 0.0082 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
and as fitted using piecewiseSEM::psem() (using linear
regression):
Structural Equation Model of tone_pcr_results$med_gender__education$piecewise$linearreg$model
Call:
education_years ~ gender_n
pcr ~ gender_n + education_years
AIC
2490.046
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years gender_n 0.3947 0.0388 461 10.1653 0.0000
pcr gender_n -0.0020 0.0133 460 -0.1537 0.8779
pcr education_years 0.0154 0.0015 460 10.4643 0.0000
Std.Estimate
- ***
-0.0067
0.4528 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.2
pcr none 0.2
Figure 65. Mediation model of gender,
years of education and pcr showing the
unstandardized coefficients (fitted using
piecewiseSEM::psem()); dotted arrows are not significant.
Figure generated using R version 4.3.3
(2024-02-29)
Moreover, testing this partial mediation model against the full mediation model using d-separation and model comparison (N.B., using linear regression) finds that there is no direct effect of gender, but that it is entirely mediated through years of education (d-sep p=0.878, model comparison χ2(1)=0.024, p=0.877).
Likewise, fitting a mediation model where age influences
tone through years of education finds a significant
positive direct (ADE) and a significant negative mediated effect (ACME);
as fitted using mediation::mediate() (with linear
regression):
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.013198 -0.025383 0.0 0.0226 *
ADE 0.001814 0.000634 0.0 0.0016 **
Total Effect -0.011384 -0.023323 0.0 0.0494 *
Prop. Mediated 1.151918 0.992957 2.1 0.0268 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
and as fitted using piecewiseSEM::psem() (using linear
regression):
Structural Equation Model of tone_pcr_results$med_age__education$piecewise$linearreg$model
Call:
education_years ~ age
pcr ~ age + education_years
AIC
2040.451
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years age -0.0335 0.0014 461 -23.8174 0.0000
pcr age 0.0018 0.0006 460 2.8325 0.0048
pcr education_years 0.0188 0.0019 460 10.1289 0.0000
Std.Estimate
- ***
0.1543 **
0.5516 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.71
pcr none 0.22
Figure 66. Mediation model of age, years
of education and pcr showing the unstandardized
coefficients (fitted using piecewiseSEM::psem()). Figure
generated using R
version 4.3.3 (2024-02-29)
Moreover, the effect of age is split between the direct positive effect and the mediated (negative effect on years of education) effect (d-sep p=0.005, model comparison χ2(1)=8.0, p=0.0047).
Likewise, fitting a mediation model where location
influences tone through years of education finds
significant negative (i.e., that B has worse performance than A) direct
(ADE) and mediated (ACME) effects; as fitted using
mediation::mediate() (with linear regression):
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.0218 -0.0352 -0.01 2e-04 ***
ADE -0.0470 -0.0708 -0.02 4e-04 ***
Total Effect -0.0688 -0.0950 -0.04 <2e-16 ***
Prop. Mediated 0.3165 0.1577 0.52 2e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
and as fitted using piecewiseSEM::psem() (using linear
regression):
Structural Equation Model of tone_pcr_results$med_location__education$piecewise$linearreg$model
Call:
education_years ~ location_bin
pcr ~ location_bin + education_years
AIC
2529.685
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years location_bin -0.2587 0.0385 461 -6.7199 0e+00
pcr location_bin -0.0472 0.0120 460 -3.9431 1e-04
pcr education_years 0.0144 0.0014 460 10.1260 0e+00
Std.Estimate
- ***
-0.1641 ***
0.4214 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.10
pcr none 0.23
Figure 67. Mediation model of age, years
of education and pcr showing the unstandardized
coefficients (fitted using piecewiseSEM::psem()). Figure
generated using R
version 4.3.3 (2024-02-29)
Moreover, the effect of location is split between the direct and the mediated effects (d-sep p=9.3×10-5, model comparison χ2(1)=15.4, p=1e-04).
Here we want to check if there is any effect of working memory on
tone above and beyond that of education, age and gender, and we found
both significant positive direct (ADE) and mediated (ACME) effects; as
fitted using mediation::mediate() (with linear
regression):
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.00360 0.00178 0.01 <2e-16 ***
ADE 0.01542 0.01152 0.02 <2e-16 ***
Total Effect 0.01902 0.01545 0.02 <2e-16 ***
Prop. Mediated 0.18836 0.09467 0.30 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
and as fitted using piecewiseSEM::psem() (using linear
regression):
Structural Equation Model of tone_pcr_results$med_wm__all$piecewise$linearreg$model
Call:
wm_norm ~ education_years + gender_n + age
pcr ~ wm_norm + education_years + gender_n + age
AIC
-1094.782
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
wm_norm education_years 0.0198 0.0021 459 9.3059 0.0000
wm_norm gender_n -0.0277 0.0146 459 -1.8919 0.0591
wm_norm age -0.0061 0.0007 459 -8.5939 0.0000
pcr wm_norm 0.1813 0.0415 458 4.3701 0.0000
pcr education_years 0.0154 0.0021 458 7.4803 0.0000
pcr gender_n -0.0007 0.0131 458 -0.0525 0.9582
pcr age 0.0029 0.0007 458 4.3316 0.0000
Std.Estimate
0.4191 ***
-0.0649
-0.3744 ***
0.2515 ***
0.4528 ***
-0.0022
0.2507 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
wm_norm none 0.50
pcr none 0.25
Figure 68. Mediation model of education,
age, gender, wm and pcr showing the
unstandardized coefficients (fitted using
piecewiseSEM::psem()). Figure generated using R version 4.3.3
(2024-02-29)
It can be seen that, indeed, working memory has a positive effect on tone while also mediating some of the effects of age and education.
Finally, we fitted a complex path model involving all these factors
using piecewiseSEM::psem() (with linear regression):
Structural Equation Model of tone_pcr_results$path_model$linearreg$model
Call:
education_years ~ age + gender_n + location_bin
age ~ location_bin + gender_n
gender_n ~ location_bin
wm_norm ~ education_years + gender_n + age
pcr ~ wm_norm + age + gender_n + education_years + location_bin
AIC
5655.388
---
Tests of directed separation:
Independ.Claim Test.Type DF Crit.Value P.Value
wm_norm ~ location_bin + ... coef 458 1.7014 0.0895
--
Global goodness-of-fit:
Chi-Squared = 2.917 with P-value = 0.088 and on 1 degrees of freedom
Fisher's C = 4.826 with P-value = 0.09 and on 2 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years age -0.0318 0.0014 459 -22.4979 0.0000
education_years gender_n 0.2755 0.0393 459 7.0071 0.0000
education_years location_bin -0.2420 0.0385 459 -6.2790 0.0000
age location_bin -0.1792 1.1363 460 -0.1577 0.8747
age gender_n -2.8378 1.2138 460 -2.3379 0.0198
gender_n location_bin -0.0181 0.1986 461 -0.0910 0.9275
wm_norm education_years 0.0198 0.0021 459 9.3059 0.0000
wm_norm gender_n -0.0277 0.0146 459 -1.8919 0.0591
wm_norm age -0.0061 0.0007 459 -8.5939 0.0000
pcr wm_norm 0.1942 0.0410 457 4.7385 0.0000
pcr age 0.0026 0.0007 457 3.8377 0.0001
pcr gender_n 0.0030 0.0129 457 0.2311 0.8173
pcr education_years 0.0133 0.0021 457 6.2998 0.0000
pcr location_bin -0.0470 0.0119 457 -3.9514 0.0001
Std.Estimate
- ***
- ***
- ***
-0.0073
-0.1084 *
-0.005
0.4191 ***
-0.0649
-0.3744 ***
0.2693 ***
0.2206 ***
0.0097
0.389 ***
-0.1633 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.76
age none 0.01
gender_n nagelkerke 0.00
wm_norm none 0.50
pcr none 0.27
Figure 69. The complex path model for pcr
showing the unstandardized coefficients (fitted using
piecewiseSEM::psem()). Figure generated using R version 4.3.3
(2024-02-29)
This confirms and extends the previous simpler mediation models in that pcr is directly influenced by working memory (positively), years of education (more is better), age (older is better) and location (A is better), but also indirectly by gender (mediated through years of education and age); both age and location also have effects mediated through years of education, and age and years of education have effects mediated through working memory.
The tone task performance as measured by the % correct total responses in the recoded dataset (aka pcr) does somewhat cluster within families (but we ignore this here due to the low sample size and variance explained), but is done better by participants from A, who are older, who have more education and who have better working memory performance, while gender has only an indirect effect.
We focus here on the d’ estimate (dpr = d
prime recoded). Its distribution seems relatively
normal and diagnostics (not shown) suggest that, indeed, a linear model
fits this variable quite well. However, piecewiseSEM has
troubles estimating (some of) the standardized path coefficients, so we
report here the unstandardized ones.
There is a highly significant difference between the two main locations (A has higher overall performance than B; βB-A=-0.63, p=1.56×10-10). For those 91 participants with information about family relationships, the generation they belong to has no effect (βold-young=0.12, p=0.62), and, moreover, there is no clustering within families (the beta model with family as a random effect has an ICC of 4.4%), suggesting that we need not model these factors here.
A beta multiple linear regression of the working memory task performance on age, gender, years of education and location and all their interactions simplifies (using manual simplification based on F-test’s p value) to a model with main effects and one interaction only:
Call:
lm(formula = dpr ~ age + gender + education_years + location_ab +
wm_norm + age:education_years + gender:education_years, data = d)
Residuals:
Min 1Q Median 3Q Max
-2.82011 -0.46418 0.07605 0.60458 2.14666
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.910244 0.506963 3.768 0.000186 ***
age -0.013304 0.010025 -1.327 0.185169
genderM 0.621352 0.194244 3.199 0.001476 **
education_years -0.055608 0.047563 -1.169 0.242956
location_abB -0.481857 0.084157 -5.726 1.87e-08 ***
wm_norm 1.639536 0.290392 5.646 2.90e-08 ***
age:education_years 0.003958 0.001014 3.901 0.000110 ***
genderM:education_years -0.090083 0.024266 -3.712 0.000231 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8705 on 455 degrees of freedom
Multiple R-squared: 0.3593, Adjusted R-squared: 0.3495
F-statistic: 36.46 on 7 and 455 DF, p-value: < 2.2e-16
Figure 70. Untransformed slopes with standard errors.
Figure generated using R version 4.3.3
(2024-02-29)
Figure 71. Predicted values of dpr showing the
interaction of education_years and gender. Figure
generated using R
version 4.3.3 (2024-02-29)
Figure 72. Predicted values of dpr showing the
interaction of education_years and age. Figure
generated using R
version 4.3.3 (2024-02-29)
suggesting that age and years of education have no main effects, that working memory has a main positive effect, that males have better performance than the females but there is an interaction between education and age, and between education and gender, and that participants from A have better performance than those from B. As above, we test more complex of causality using mediation and path analysis.
The mediation model where gender influences tone
through years of education finds a highly significant positive
indirect effect (ACME) and no direct effect (ADE); as fitted using
mediation::mediate() (with linear regression):
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.2904 0.1874 0.40 <2e-16 ***
ADE -0.0112 -0.1980 0.17 0.91
Total Effect 0.2792 0.0679 0.49 0.01 *
Prop. Mediated 1.0329 0.5950 3.32 0.01 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
and as fitted using piecewiseSEM::psem():
Structural Equation Model of tone_dpr_results$med_gender__education$piecewise$model
Call:
education_years ~ gender_n
dpr ~ gender_n + education_years
AIC
4349.853
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years gender_n 0.3947 0.0388 461 10.1653 0.0000
dpr gender_n -0.0111 0.0992 460 -0.1121 0.9108
dpr education_years 0.1179 0.0110 460 10.7169 0.0000
Std.Estimate
- ***
-0.0048
0.4613 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.20
dpr none 0.21
Figure 73. Mediation model of gender,
years of education and dpr showing the
unstandardized coefficients (fitted using
piecewiseSEM::psem()); dotted arrows are not significant.
Figure generated using R version 4.3.3
(2024-02-29)
Moreover, testing this partial mediation model against the full mediation model using d-separation and model comparison finds that there is no direct effect of gender, but that it is entirely mediated through years of education (d-sep p=0.911, model comparison χ2(1)=0.013, p=0.909).
Likewise, fitting a mediation model where age influences
tone through years of education finds a significant
positive direct (ADE) and a significant negative mediated effect (ACME);
as fitted using mediation::mediate() (with linear
regression):
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.10432 -0.19937 -0.01 0.0242 *
ADE 0.01632 0.00703 0.03 0.0002 ***
Total Effect -0.08800 -0.18221 0.00 0.0572 .
Prop. Mediated 1.17639 0.59488 2.30 0.0330 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
and as fitted using piecewiseSEM::psem():
Structural Equation Model of tone_dpr_results$med_age__education$piecewise$model
Call:
education_years ~ age
dpr ~ age + education_years
AIC
3896.584
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years age -0.0335 0.0014 461 -23.8174 0e+00
dpr age 0.0163 0.0047 460 3.4264 7e-04
dpr education_years 0.1484 0.0138 460 10.7601 0e+00
Std.Estimate
- ***
0.1849 ***
0.5807 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.71
dpr none 0.23
Figure 74. Mediation model of age, years
of education and dpr showing the unstandardized
coefficients (fitted using piecewiseSEM::psem()). Figure
generated using R
version 4.3.3 (2024-02-29)
Moreover, the effect of age is split between the direct positive effect and the mediated (negative effect on years of education) effect (d-sep p=6.66×10-4, model comparison χ2(1)=11.7, p=6e-04).
Likewise, fitting a mediation model where location
influences tone through years of education finds
significant negative (i.e., that B has worse performance than A) direct
(ADE) and mediated (ACME) effects; as fitted using
mediation::mediate() (with linear regression):
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.164 -0.261 -0.07 2e-04 ***
ADE -0.466 -0.633 -0.30 <2e-16 ***
Total Effect -0.629 -0.819 -0.44 <2e-16 ***
Prop. Mediated 0.259 0.131 0.40 2e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
and as fitted using piecewiseSEM::psem():
Structural Equation Model of tone_dpr_results$med_location__education$piecewise$model
Call:
education_years ~ location_bin
dpr ~ location_bin + education_years
AIC
4377.625
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years location_bin -0.2587 0.0385 461 -6.7199 0
dpr location_bin -0.4654 0.0881 460 -5.2803 0
dpr education_years 0.1076 0.0104 460 10.3027 0
Std.Estimate
- ***
-0.2158 ***
0.4211 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.10
dpr none 0.26
Figure 75. Mediation model of age, years
of education and dpr showing the unstandardized
coefficients (fitted using piecewiseSEM::psem()). Figure
generated using R
version 4.3.3 (2024-02-29)
Moreover, the effect of location is split between the direct and the mediated effects (d-sep p=1.99×10-7, model comparison χ2(1)=27.2, p=0).
Here we want to check if there is any effect of working memory on
tone above and beyond that of education, age and gender, and we found
both significant positive direct (ADE) and mediated (ACME) effects; as
fitted using mediation::mediate() (with linear
regression):
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.0307 0.0170 0.05 <2e-16 ***
ADE 0.1195 0.0901 0.15 <2e-16 ***
Total Effect 0.1502 0.1229 0.18 <2e-16 ***
Prop. Mediated 0.2034 0.1120 0.31 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
and as fitted using piecewiseSEM::psem():
Structural Equation Model of psem_dpr_wm
Call:
wm_norm ~ education_years + gender_n + age
dpr ~ wm_norm + education_years + gender_n + age
AIC
755.093
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
wm_norm education_years 0.0198 0.0021 459 9.3059 0.0000
wm_norm gender_n -0.0277 0.0146 459 -1.8919 0.0591
wm_norm age -0.0061 0.0007 459 -8.5939 0.0000
dpr wm_norm 1.5471 0.3059 458 5.0572 0.0000
dpr education_years 0.1194 0.0152 458 7.8449 0.0000
dpr gender_n -0.0011 0.0963 458 -0.0113 0.9910
dpr age 0.0259 0.0050 458 5.1699 0.0000
Std.Estimate
0.4191 ***
-0.0649
-0.3744 ***
0.2865 ***
0.4674 ***
-0.0005
0.2945 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
wm_norm none 0.50
dpr none 0.27
Figure 76. Mediation model of education,
age, gender, wm and dpr showing the
unstandardized coefficients (fitted using
piecewiseSEM::psem()). Figure generated using R version 4.3.3
(2024-02-29)
It can be seen that, indeed, working memory has a positive effect on tone while also mediating some of the effects of age and education.
Finally, we fitted a complex path model involving all these factors
using piecewiseSEM::psem():
Structural Equation Model of psem_dpr_path
Call:
education_years ~ age + gender_n + location_bin
age ~ location_bin + gender_n
gender_n ~ location_bin
wm_norm ~ education_years + gender_n + age
dpr ~ wm_norm + age + gender_n + education_years + location_bin
AIC
7492.614
---
Tests of directed separation:
Independ.Claim Test.Type DF Crit.Value P.Value
wm_norm ~ location_bin + ... coef 458 1.7014 0.0895
--
Global goodness-of-fit:
Chi-Squared = 2.917 with P-value = 0.088 and on 1 degrees of freedom
Fisher's C = 4.826 with P-value = 0.09 and on 2 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years age -0.0318 0.0014 459 -22.4979 0.0000
education_years gender_n 0.2755 0.0393 459 7.0071 0.0000
education_years location_bin -0.2420 0.0385 459 -6.2790 0.0000
age location_bin -0.1792 1.1363 460 -0.1577 0.8747
age gender_n -2.8378 1.2138 460 -2.3379 0.0198
gender_n location_bin -0.0181 0.1986 461 -0.0910 0.9275
wm_norm education_years 0.0198 0.0021 459 9.3059 0.0000
wm_norm gender_n -0.0277 0.0146 459 -1.8919 0.0591
wm_norm age -0.0061 0.0007 459 -8.5939 0.0000
dpr wm_norm 1.6736 0.2980 457 5.6161 0.0000
dpr age 0.0224 0.0049 457 4.5670 0.0000
dpr gender_n 0.0351 0.0938 457 0.3739 0.7087
dpr education_years 0.0980 0.0153 457 6.3950 0.0000
dpr location_bin -0.4637 0.0865 457 -5.3576 0.0000
Std.Estimate
- ***
- ***
- ***
-0.0073
-0.1084 *
-0.005
0.4191 ***
-0.0649
-0.3744 ***
0.3099 ***
0.2549 ***
0.0152
0.3833 ***
-0.215 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.76
age none 0.01
gender_n nagelkerke 0.00
wm_norm none 0.50
dpr none 0.32
Figure 77. The complex path model for dpr
showing the unstandardized coefficients (fitted using
piecewiseSEM::psem()). Figure generated using R version 4.3.3
(2024-02-29)
This confirms and extends the previous simpler mediation models in that dpr is directly influenced by working memory (positively), years of education (more is better), age (older is better) and location (A is better), but also indirectly by gender (mediated through years of education and age); both age and location also have effects mediated through years of education, and age and years of education have effects mediated through working memory.
As expected, d’ and % correct total responses in the recoded dataset behave very similarly, down to very similar coefficient estimates and associated p-values. We compared (using AIC) the two measures (a rather “apples to pears” comparison in this case) using the multiple regression model ΔAIC(%cr, d’) = -2136.0, and the complex path model ΔAIC(%cr, d’) = -1837.2, both suggesting that %cr fits the data much better than d’.
Here we look at secondary measures.
We focus here on the % total correct responses on the ‘original’ dataset estimate (pco = percent correct original) – see the comments above for pcr.
There is a highly significant difference between the two main locations (A has higher overall performance than B; beta regression βB-A=-0.3, p=2.73×10-7). For those 91 participants with information about family relationships, the generation they belong to has no effect (beta regression βold-young=0.07, p=0.619) and there is no clustering within families (the linear model with family as a random effect has an ICC of 9.8%, and including family as a fixed effect in a “flat” beta regression vs excluding it results in a p=0.295).
Family: beta ( logit )
Formula:
pco ~ age + gender + education_years + location_ab + wm_norm +
gender:education_years + education_years:wm_norm
Data: d
AIC BIC logLik deviance df.resid
-909.2 -872.0 463.6 -927.2 454
Dispersion parameter for beta family (): 17.8
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.037382 0.199472 -0.187 0.851344
age 0.010988 0.003276 3.354 0.000796 ***
genderM 0.336120 0.122824 2.737 0.006208 **
education_years 0.116932 0.016936 6.904 5.04e-12 ***
location_abB -0.243294 0.053010 -4.590 4.44e-06 ***
wm_norm 1.608457 0.298826 5.383 7.34e-08 ***
genderM:education_years -0.052539 0.015700 -3.346 0.000819 ***
education_years:wm_norm -0.107255 0.034818 -3.080 0.002067 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Figure 78. Untransformed slopes with standard errors.
Figure generated using R version 4.3.3
(2024-02-29)
Figure 79. Predicted values of pco showing the
interaction ofgender and age. Figure generated using
R version 4.3.3
(2024-02-29)
Figure 80. Predicted values of pco showing the
interaction of education_years and age. Figure
generated using R
version 4.3.3 (2024-02-29)
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.03391 0.02207 0.05 <2e-16 ***
ADE -0.00220 -0.02378 0.02 0.84
Total Effect 0.03171 0.00673 0.06 0.01 **
Prop. Mediated 1.06049 0.61053 3.69 0.01 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
Structural Equation Model of tone_pco_results$med_gender__education$piecewise$linearreg$model
Call:
education_years ~ gender_n
pco ~ gender_n + education_years
AIC
2376.695
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years gender_n 0.3947 0.0388 461 10.1653 0.0000
pco gender_n -0.0022 0.0118 460 -0.1836 0.8544
pco education_years 0.0137 0.0013 460 10.5265 0.0000
Std.Estimate
- ***
-0.0079
0.455 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.20
pco none 0.21
Figure 81. Mediation model. Figure generated using R version 4.3.3
(2024-02-29)
d-sep p=0.854, model comparison χ2(1)=0.034, p=0.854
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.011765 -0.022675 0.00 0.0202 *
ADE 0.001689 0.000639 0.00 0.0028 **
Total Effect -0.010077 -0.020841 0.00 0.0486 *
Prop. Mediated 1.160614 0.986975 2.16 0.0286 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
Structural Equation Model of tone_pco_results$med_age__education$piecewise$linearreg$model
Call:
education_years ~ age
pco ~ age + education_years
AIC
1926.169
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years age -0.0335 0.0014 461 -23.8174 0.0000
pco age 0.0017 0.0006 460 2.9957 0.0029
pco education_years 0.0169 0.0016 460 10.2867 0.0000
Std.Estimate
- ***
0.1628 **
0.5591 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.71
pco none 0.22
Figure 82. Mediation model. Figure generated using R version 4.3.3
(2024-02-29)
d-sep p=0.003, model comparison χ2(1)=8.9, p=0.0028.
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.0194 -0.0309 -0.01 8e-04 ***
ADE -0.0403 -0.0607 -0.02 2e-04 ***
Total Effect -0.0597 -0.0827 -0.04 <2e-16 ***
Prop. Mediated 0.3235 0.1610 0.53 8e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
Structural Equation Model of tone_pco_results$med_location__education$piecewise$linearreg$model
Call:
education_years ~ location_bin
pco ~ location_bin + education_years
AIC
2417.395
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years location_bin -0.2587 0.0385 461 -6.7199 0e+00
pco location_bin -0.0404 0.0106 460 -3.8039 2e-04
pco education_years 0.0128 0.0013 460 10.1948 0e+00
Std.Estimate
- ***
-0.1583 ***
0.4243 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.10
pco none 0.23
Figure 83. Mediation model. Figure generated using R version 4.3.3
(2024-02-29)
d-sep p=1.62×10-4, model comparison χ2(1)=14.3, p=2e-04
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.00324 0.00165 0.01 <2e-16 ***
ADE 0.01385 0.01047 0.02 <2e-16 ***
Total Effect 0.01709 0.01395 0.02 <2e-16 ***
Prop. Mediated 0.18824 0.09633 0.30 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
Structural Equation Model of tone_pco_results$med_wm__all$piecewise$linearreg$model
Call:
wm_norm ~ education_years + gender_n + age
pco ~ wm_norm + education_years + gender_n + age
AIC
-1209.875
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
wm_norm education_years 0.0198 0.0021 459 9.3059 0.0000
wm_norm gender_n -0.0277 0.0146 459 -1.8919 0.0591
wm_norm age -0.0061 0.0007 459 -8.5939 0.0000
pco wm_norm 0.1634 0.0366 458 4.4599 0.0000
pco education_years 0.0139 0.0018 458 7.6064 0.0000
pco gender_n -0.0011 0.0115 458 -0.0920 0.9268
pco age 0.0027 0.0006 458 4.5253 0.0000
Std.Estimate
0.4191 ***
-0.0649
-0.3744 ***
0.2559 ***
0.4591 ***
-0.0039
0.2611 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
wm_norm none 0.50
pco none 0.25
Figure 84. Mediation model. Figure generated using R version 4.3.3
(2024-02-29)
Structural Equation Model of tone_pco_results$path_model$linearreg$model
Call:
education_years ~ age + gender_n + location_bin
age ~ location_bin + gender_n
gender_n ~ location_bin
wm_norm ~ education_years + gender_n + age
pco ~ wm_norm + age + gender_n + education_years + location_bin
AIC
5541.539
---
Tests of directed separation:
Independ.Claim Test.Type DF Crit.Value P.Value
wm_norm ~ location_bin + ... coef 458 1.7014 0.0895
--
Global goodness-of-fit:
Chi-Squared = 2.917 with P-value = 0.088 and on 1 degrees of freedom
Fisher's C = 4.826 with P-value = 0.09 and on 2 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years age -0.0318 0.0014 459 -22.4979 0.0000
education_years gender_n 0.2755 0.0393 459 7.0071 0.0000
education_years location_bin -0.2420 0.0385 459 -6.2790 0.0000
age location_bin -0.1792 1.1363 460 -0.1577 0.8747
age gender_n -2.8378 1.2138 460 -2.3379 0.0198
gender_n location_bin -0.0181 0.1986 461 -0.0910 0.9275
wm_norm education_years 0.0198 0.0021 459 9.3059 0.0000
wm_norm gender_n -0.0277 0.0146 459 -1.8919 0.0591
wm_norm age -0.0061 0.0007 459 -8.5939 0.0000
pco wm_norm 0.1743 0.0362 457 4.8103 0.0000
pco age 0.0024 0.0006 457 4.0484 0.0001
pco gender_n 0.0020 0.0114 457 0.1795 0.8577
pco education_years 0.0120 0.0019 457 6.4562 0.0000
pco location_bin -0.0399 0.0105 457 -3.7875 0.0002
Std.Estimate
- ***
- ***
- ***
-0.0073
-0.1084 *
-0.005
0.4191 ***
-0.0649
-0.3744 ***
0.2729 ***
0.2323 ***
0.0075
0.398 ***
-0.1563 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.76
age none 0.01
gender_n nagelkerke 0.00
wm_norm none 0.50
pco none 0.28
Figure 85. The complex path model. Figure generated
using R version
4.3.3 (2024-02-29)
pco behaves relatively similarly to pcr, and we compared (using AIC) the two (more meaningful here) using the multiple regression model ΔAIC(pcr, pco) = -31.3, and the complex path model ΔAIC(pcr, pco) = 113.8, both suggesting that pcr (the ‘reduced’ dataset) fits the data better than pco (the ‘original’ dataset).
We focus here on d’ on the ‘original’ dataset estimate (dpo = d prime original) – see the comments above for dpr.
There is a highly significant difference between the two main locations (A has higher overall performance than B; βB-A=-0.63, p=1.56×10-10). For those 91 participants with information about family relationships, the generation they belong to has no effect (βold-young=0.12, p=0.62), and, moreover, there is no clustering within families (the beta model with family as a random effect has an ICC of 4.4%), suggesting that we need not model these factors here.
Call:
lm(formula = dpo ~ age + gender + education_years + location_ab +
wm_norm + age:education_years + gender:education_years, data = d)
Residuals:
Min 1Q Median 3Q Max
-2.53839 -0.35857 0.08772 0.51313 1.87452
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.4281037 0.4467995 3.196 0.001489 **
age -0.0075783 0.0088356 -0.858 0.391504
genderM 0.5576186 0.1711920 3.257 0.001209 **
education_years -0.0297572 0.0419181 -0.710 0.478137
location_abB -0.3529232 0.0741694 -4.758 2.63e-06 ***
wm_norm 1.4759196 0.2559303 5.767 1.49e-08 ***
age:education_years 0.0030647 0.0008941 3.428 0.000664 ***
genderM:education_years -0.0730950 0.0213866 -3.418 0.000688 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.7672 on 455 degrees of freedom
Multiple R-squared: 0.3526, Adjusted R-squared: 0.3427
F-statistic: 35.41 on 7 and 455 DF, p-value: < 2.2e-16
Figure 86. Untransformed slopes with standard errors.
Figure generated using R version 4.3.3
(2024-02-29)
Figure 87. Predicted values of dpo showing the
interaction ofgender and age. Figure generated using
R version 4.3.3
(2024-02-29)
Figure 88. Predicted values of dpo showing the
interaction of education_years and age. Figure
generated using R
version 4.3.3 (2024-02-29)
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.2583 0.1664 0.36 <2e-16 ***
ADE 0.0425 -0.1220 0.21 0.6134
Total Effect 0.3008 0.1122 0.49 0.0014 **
Prop. Mediated 0.8609 0.5256 2.04 0.0014 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
Structural Equation Model of tone_dpo_results$med_gender__education$piecewise$model
Call:
education_years ~ gender_n
dpo ~ gender_n + education_years
AIC
4220.805
---
Tests of directed separation:
No independence claims present. Tests of directed separation not possible.
--
Global goodness-of-fit:
Chi-Squared = 0 with P-value = 1 and on 0 degrees of freedom
Fisher's C = NA with P-value = NA and on 0 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years gender_n 0.3947 0.0388 461 10.1653 0.000
dpo gender_n 0.0427 0.0863 460 0.4948 0.621
dpo education_years 0.1046 0.0096 460 10.9325 0.000
Std.Estimate
- ***
0.0211
0.4669 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.20
dpo none 0.22
Figure 89. Mediation model. Figure generated using R version 4.3.3
(2024-02-29)
d-sep p=0.621, model comparison χ2(1)=0.25, p=0.62
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.09232 -0.17748 -0.01 0.0250 *
ADE 0.01412 0.00629 0.02 0.0004 ***
Total Effect -0.07820 -0.16227 0.00 0.0544 .
Prop. Mediated 1.17229 0.65176 2.14 0.0294 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
Length Class Mode
model 3 psem list
dsep.p 1 -none- numeric
anova 1 anova.psem list
Figure 90. Mediation model. Figure generated using R version 4.3.3
(2024-02-29)
d-sep p=6.86×10-4, model comparison χ2(1)=11.6, p=7e-04.
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME -0.150 -0.239 -0.07 4e-04 ***
ADE -0.336 -0.486 -0.19 <2e-16 ***
Total Effect -0.487 -0.659 -0.32 <2e-16 ***
Prop. Mediated 0.307 0.159 0.48 4e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
Length Class Mode
model 3 psem list
dsep.p 1 -none- numeric
anova 1 anova.psem list
Figure 91. Mediation model. Figure generated using R version 4.3.3
(2024-02-29)
d-sep p=1.62×10-5, model comparison χ2(1)=18.7, p=0
Causal Mediation Analysis
Quasi-Bayesian Confidence Intervals
Estimate 95% CI Lower 95% CI Upper p-value
ACME 0.0280 0.0164 0.04 <2e-16 ***
ADE 0.1040 0.0785 0.13 <2e-16 ***
Total Effect 0.1320 0.1082 0.16 <2e-16 ***
Prop. Mediated 0.2110 0.1214 0.32 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sample Size Used: 463
Simulations: 10000
Length Class Mode
model 3 psem list
summary 8 summary.psem list
Figure 92. Mediation model. Figure generated using R version 4.3.3
(2024-02-29)
Structural Equation Model of psem_dpo_path
Call:
education_years ~ age + gender_n + location_bin
age ~ location_bin + gender_n
gender_n ~ location_bin
wm_norm ~ education_years + gender_n + age
dpo ~ wm_norm + age + gender_n + education_years + location_bin
AIC
7369.870
---
Tests of directed separation:
Independ.Claim Test.Type DF Crit.Value P.Value
wm_norm ~ location_bin + ... coef 458 1.7014 0.0895
--
Global goodness-of-fit:
Chi-Squared = 2.917 with P-value = 0.088 and on 1 degrees of freedom
Fisher's C = 4.826 with P-value = 0.09 and on 2 degrees of freedom
---
Coefficients:
Response Predictor Estimate Std.Error DF Crit.Value P.Value
education_years age -0.0318 0.0014 459 -22.4979 0.0000
education_years gender_n 0.2755 0.0393 459 7.0071 0.0000
education_years location_bin -0.2420 0.0385 459 -6.2790 0.0000
age location_bin -0.1792 1.1363 460 -0.1577 0.8747
age gender_n -2.8378 1.2138 460 -2.3379 0.0198
gender_n location_bin -0.0181 0.1986 461 -0.0910 0.9275
wm_norm education_years 0.0198 0.0021 459 9.3059 0.0000
wm_norm gender_n -0.0277 0.0146 459 -1.8919 0.0591
wm_norm age -0.0061 0.0007 459 -8.5939 0.0000
dpo wm_norm 1.5057 0.2610 457 5.7688 0.0000
dpo age 0.0201 0.0043 457 4.6820 0.0000
dpo gender_n 0.0802 0.0821 457 0.9763 0.3294
dpo education_years 0.0884 0.0134 457 6.5909 0.0000
dpo location_bin -0.3379 0.0758 457 -4.4585 0.0000
Std.Estimate
- ***
- ***
- ***
-0.0073
-0.1084 *
-0.005
0.4191 ***
-0.0649
-0.3744 ***
0.318 ***
0.261 ***
0.0397
0.3947 ***
-0.1787 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
---
Individual R-squared:
Response method R.squared
education_years nagelkerke 0.76
age none 0.01
gender_n nagelkerke 0.00
wm_norm none 0.50
dpo none 0.32
Figure 93. The complex path model. Figure generated
using R version
4.3.3 (2024-02-29)
dpo behaves relatively similarly to dpr, and we compared (using AIC) the two (more meaningful here) using the multiple regression model ΔAIC(dpr, dpo) = 117.0, and the complex path model ΔAIC(dpr, dpo) = 122.7, both suggesting that dpr (the ‘reduced’ dataset) fits the data worse than dpo (the ‘original’ dataset). We also compared dpo and pco using the multiple regression model ΔAIC(%cr, d’) = -1987.7, and the complex path model ΔAIC(%cr, d’) = -1828.3, both suggesting that %cr fits the data much better than d’.
Finally, let’s compare both measures on both datasets:
It can be seen that the ordering (taking into account various caveats) is: pcr > pco >> dpo > dpr, suggesting that, indeed, the % correct responses on the ‘recoded’ dataset might be the best choice.
The bias c on the ‘recoded’ dataset (cbr from c bias recoded) shows some clustering within families (18.2%), but no effects of generation nor of location. While the clustering within families is potentially interesting (and might suggest that the bias has a familial component, which may have shared environmental and/or genetic components) we cannot really model it here due to the massive loss of data (dropping from 489 to 91 participants).
Interestingly, including all the potential predictors (and their interactions) in a multiple linear regression results in a model that is not better than the null model with no predictor (F(31)=1.1, p=0.377), suggesting that the bias c is idiosyncratic, probably influenced by personality, genetic and/or cultural variables that we did not measure here.
The bias c on the ‘original’ dataset (cbo from c bias original) shows some clustering within families (12.3%), but no effects of generation nor of location. As above, while the clustering within families is potentially interesting we cannot really model it here due to the massive loss of data.
Including all the potential predictors (and their interactions) in a multiple linear regression model results, after manual simplification, in only years of education having a positive significant effect (i.e., participants with more years of education have a tendency to answer ‘same’ which might simply say something about how they treat the “weird” items):
Call:
lm(formula = cbo ~ education_years, data = d)
Residuals:
Min 1Q Median 3Q Max
-1.98951 -0.25221 0.00983 0.24083 1.24920
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.389935 0.028387 13.736 < 2e-16 ***
education_years 0.020921 0.003914 5.345 1.43e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3553 on 461 degrees of freedom
Multiple R-squared: 0.05835, Adjusted R-squared: 0.0563
F-statistic: 28.56 on 1 and 461 DF, p-value: 1.428e-07
Here we compare side-by-side the complex path models (with linear instead of Beta regressions and showing the unstandardized coefficients) for the original (“o”), recoded (“r”) and removed (“x”) datasets.
Figure 94. The complex path model for pco.
Figure generated using R version 4.3.3
(2024-02-29)
Figure 95. The complex path model for pcr.
Figure generated using R version 4.3.3
(2024-02-29)
Figure 96. The complex path model for pcx.
Figure generated using R version 4.3.3
(2024-02-29)
Putting everything together, it seems that:
on the ‘recoded’ dataset (i.e., with the “weird” items recoded as ‘same’ items), both measures of performance, the % total correct responses and the bias-free d’, behave in very similar ways, namely:
on the ‘original’ dataset, the two measure also behave similarly
the % total correct responses on the ‘reduced’ dataset fits the data best (in terms of AIC), so, all in all, we should probably take it as our primary measure.
Here we give some technical details, also citing the most important methodological packages we use in this paper.
PCA: we used prcomp(...) in
package stats to estimate the PCs (which uses the singular
value decomposition method), and fviz_eig(...) and
fviz_pca_var(...) in package factoextra (Kassambara & Mundt, 2020).
EFA: we implemented the model fit using
cfa(...) and the fit measures using
fitMeasures(...) from package lavaan (Rosseel, 2012), and we plotted the fitted
models using lavaanPlot(...) from package
lavaanPlot (Lishinski, 2021).
The preliminary EFA tests use KMO(...) in package
psych (William Revelle,
2023), cortest.bartlett(...) also in package
psych, and det(cor(...)); the most likely
number of latent factors uses several methods implemented by
fa.parallel(...) and nfactors(...) in package
psych.
CFA: uses factanal(...) from
package stats (with the “promax” rotation) and the
resulting model was plotted using fa.diagram(...) in
package psych.
Mokken: we estimated the Guttman errors
using check.errors (...), the scalability coefficients
H with coefH(...) and aisp with
aisp(...), all in the package mokken (Ark, 2007).
SDT: the cumulative density function (cdf)
of the normal distribution was estimated using qnorm(...)
in package stats; we estimated d’, β,
c, A’ and B’’D using
dprime(..., adjusted=TRUE) in package psycho
(Makowski, 2018) with the adjustment for
extreme values. Please see here
and here
for visual, non-technical explanations of STD.
Regression: we tested the clustering by
family using mixed-effects models as implemented by
lmer(...) in package lme4 (Bates, Mächler, Bolker, & Walker, 2015)
with p-values as implemented in package lmerTest
(Kuznetsova, Brockhoff, & Christensen,
2017), and the intra-class correlation coefficient (ICC) was
estimated using icc(...) in package
performance (Lüdecke, Ben-Shachar,
Patil, Waggoner, & Makowski, 2021). Logistic regression is
implemented by glm(..., family=binomial("logit")) or
glmer(..., family=binomial("logit")) (as appropriate),
linear regression by lm(...) or lmer(...),
Poisson regression by glm(..., family=poisson()), and Beta
regression bu glmmTMB(..., family=beta_family()) from
package glmmTMB (Brooks et al.,
2017).
Mediation: we used
mediate(...) in package mediation (Tingley, Yamamoto, Hirose, Keele, & Imai,
2014) with 10,000 simulations and heteroskedasticity-consistent
standard errors for “classic” mediation modeling, and
psem(...) and dSep(...) from package
piecewiseSEM (Lefcheck, 2016)
for modeling the mediation through a piecewise Structural Equation
Modelling approach with d-separation.
Path models: we used the piecewise
Structural Equation Modelling approach as implemented by
psem(...) in package piecewiseSEM (Lefcheck, 2016) and lavaan (Rosseel, 2012).
Here we present some examples of responses on an AX task (such as the tone task here) where pcr fails to distinguish clearly different response patterns, but the SDT-derived discrimination and bias do.
Consider an AX task with an equal numbers of signal and non-signal items, #(A ≠ X) = #(A = X), and several types of participant response strategies as given in Table 6. It can be seen that the first 6 strategies, some of which are clearly very different, have very similar % of correct responses (50% or very close to it), showing that relaying on this measure only might hide important inter-individual variation. Second, it can be seen that for the three extreme strategies 1, 3 and 5, the % correct response is exactly 50% (as expected), but also that the sensitivity d’ and bias b are identically 0.0 and -1.0, respectively, while the non-parametric estimates completely fail for strategies 1 and 3; however, criterion location c does differentiate between these three strategies in the correct direction and strength. Slightly relaxing the extreme strategies 1 and 3 and arguably making them more realistic (by allowing a small probability of just 1% of “error”) is enough to make all five measures of sensitivity and bias informative. The 6th strategy is also highly artificial but correctly diagnosed by all 5 estimates. Finally, the last 2 strategies, while also extreme, might reflect actual participant behavior and are correctly diagnosed by the % correct responses and by the parametric estimates (but not by the non-parametric ones, whose estimation mostly fails). Therefore, this suggests that, in general, (a) the % correct responses by itself fails to disambiguate between clearly different strategies, but that (b) the parametric estimates d’ and c, when used together, do capture such differences correctly (the non-parametric estimates should work as well in non-extreme cases).
CPU: Apple M3 (8 threads)
RAM (memory): 17.2 GB
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
locale: en_US.UTF-8||en_US.UTF-8||en_US.UTF-8||C||en_US.UTF-8||en_US.UTF-8
attached base packages: tools, parallel, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: benchmarkme(v.1.0.8), mediation(v.4.5.0), sandwich(v.3.1-1), mvtnorm(v.1.3-3), DHARMa(v.0.4.7), performance(v.0.13.0), glmmTMB(v.1.1.10), lmerTest(v.3.1-3), lme4(v.1.1-36), Matrix(v.1.6-5), DiagrammeR(v.1.0.11), viridis(v.0.6.5), viridisLite(v.0.4.2), sjPlot(v.2.8.17), gplots(v.3.2.0), gridExtra(v.2.3), ggrepel(v.0.9.6), piecewiseSEM(v.2.3.0), lavaanPlot(v.0.8.1), lavaan(v.0.6-19), psycho(v.0.6.1), mokken(v.3.1.2), poLCA(v.1.6.0.1), MASS(v.7.3-60.0.1), scatterplot3d(v.0.3-44), psych(v.2.4.12), factoextra(v.1.0.7), ggplot2(v.3.5.1), reshape2(v.1.4.4), dplyr(v.1.1.4), tidyr(v.1.3.1), pander(v.0.6.5) and knitr(v.1.49)
loaded via a namespace (and not attached): RColorBrewer(v.1.1-3), rstudioapi(v.0.17.1), jsonlite(v.1.8.9), datawizard(v.1.0.0), magrittr(v.2.0.3), TH.data(v.1.1-2), estimability(v.1.5.1), farver(v.2.1.2), nloptr(v.2.1.1), rmarkdown(v.2.29), vctrs(v.0.6.5), minqa(v.1.2.8), effectsize(v.1.0.0), base64enc(v.0.1-3), rstatix(v.0.7.2), htmltools(v.0.5.8.1), forcats(v.1.0.0), curl(v.6.1.0), haven(v.2.5.4), broom(v.1.0.7), Formula(v.1.2-5), sjmisc(v.2.8.10), sass(v.0.4.9), KernSmooth(v.2.23-26), bslib(v.0.8.0), DiagrammeRsvg(v.0.1), htmlwidgets(v.1.6.4), plyr(v.1.8.9), emmeans(v.1.10.6), zoo(v.1.8-12), cachem(v.1.1.0), TMB(v.1.9.16), igraph(v.2.1.2), iterators(v.1.0.14), lifecycle(v.1.0.4), pkgconfig(v.2.0.3), sjlabelled(v.1.2.0), R6(v.2.5.1), fastmap(v.1.2.0), rbibutils(v.2.3), digest(v.0.6.37), numDeriv(v.2016.8-1.1), rsvg(v.2.6.1), colorspace(v.2.1-1), Hmisc(v.5.2-2), ggpubr(v.0.6.0), labeling(v.0.4.3), httr(v.1.4.7), abind(v.1.4-8), mgcv(v.1.9-1), compiler(v.4.3.3), doParallel(v.1.0.17), withr(v.3.0.2), htmlTable(v.2.4.3), backports(v.1.5.0), carData(v.3.0-5), ggsignif(v.0.6.4), sjstats(v.0.19.0), gtools(v.3.9.5), caTools(v.1.18.3), pbivnorm(v.0.6.0), foreign(v.0.8-88), nnet(v.7.3-20), glue(v.1.8.0), quadprog(v.1.5-8), nlme(v.3.1-166), grid(v.4.3.3), checkmate(v.2.3.2), cluster(v.2.1.8), generics(v.0.1.3), lpSolve(v.5.6.23), gtable(v.0.3.6), data.table(v.1.16.4), hms(v.1.1.3), car(v.3.1-3), foreach(v.1.5.2), pillar(v.1.10.1), stringr(v.1.5.1), benchmarkmeData(v.1.0.4), splines(v.4.3.3), lattice(v.0.22-6), survival(v.3.8-3), tidyselect(v.1.2.1), reformulas(v.0.4.0), V8(v.6.0.0), stats4(v.4.3.3), xfun(v.0.50), MuMIn(v.1.48.4), visNetwork(v.2.1.2), stringi(v.1.8.4), yaml(v.2.3.10), boot(v.1.3-31), evaluate(v.1.0.3), codetools(v.0.2-20), tibble(v.3.2.1), cli(v.3.6.3), rpart(v.4.1.24), parameters(v.0.24.1), xtable(v.1.8-4), Rdpack(v.2.6.2), munsell(v.0.5.1), jquerylib(v.0.1.4), Rcpp(v.1.0.14), ggeffects(v.2.0.0), png(v.0.1-8), coda(v.0.19-4.1), bayestestR(v.0.15.0), bitops(v.1.0-9), scales(v.1.3.0), insight(v.1.0.1), purrr(v.1.0.2), rlang(v.1.1.4), multcomp(v.1.4-26) and mnormt(v.2.1.1)
This is an example note. Click the symbol at the end of the note to go back to where the note is called in the text.↩︎
Please note that some of the models used here are
computationally expensive, and even compiling this
Rmarkdown script might require a relatively powerful
machine. To help with this, and to ensure full replicability of our
results, we have cached some of these expensive sections in the
cached_results folder as XZ-compressed
RData files. However, it might happen that versions of some
of the packages different from those that we used here might not be
fully compatible with the saved RData files, resulting in
errors compiling this Rmarkdown script or errors
displaying/plotting the results. In this case, we recommend using the
exact same versions of R and of the packages that we used
(listed in the Session
information), or, if not possible, the deletion of the
offending RData files and the full recompilation of the
Rmarkdown script (which is smart enough to re-generate only
those missing cached results).↩︎